fairpy.model

class fairpy.model.DIRemover(s_idx: int | Sequence[int], repair_feat_idx: int | Sequence[int] | None = None, num_feat_idx: int | Sequence[int] | None = None, repair_level: float = 1.0)

Certifying and Removing Disparate Impact

Reference:

https://dl.acm.org/doi/pdf/10.1145/2783258.2783311

Code adopted from:

https://github.com/algofairness/BlackBoxAuditing

# TODO: bugs on data types in numpy

Attributes

classes_ndarray of shape (n_classes,)

A list of class labels known to the classifier.

s_classes_ndarray of shape (n_sensitive_group,)

A list of sensitive classes known to LabelBias during training.

n_features_in_int

Number of features seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

s_idx_list of int

The index(es) of the sensitive attribute(s) in the data matrix used to repair data.

repair_feat_idx_list of int

The index(es) of the feature(s) to be repaired in the data matrix.

num_feat_idx_list of int

The index(es) of the numerical feature(s) in the data matrix.

Examples

>>> from fairpy.dataset import Adult
>>> from fairpy.model import DIRemover
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = DIRemover(s_idx=dataset.feat_idx.sen_idx, num_feat_idx=0)
>>> new_X_train = model.fit_transform(split_data.X_train)
fit_transform(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]

Fit DIRemover and return repaired data matrix.

Parameters

Xarray-like of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

Returns

Xarray-like of shape (n_samples, n_features)

Repaired data matrix.

class fairpy.model.EqOddsCalib

Equality of Opportunity in Supervised Learning

Reference:

https://arxiv.org/pdf/1610.02413.pdf

Code adopted from:

https://github.com/gpleiss/equalized_odds_and_calibration

Attributes

classes_ndarray of shape (n_classes,)

A list of class labels known to the classifier.

s_classes_ndarray of shape (n_sensitive_group,)

A list of sensitive classes known to LabelBias during training.

n_features_in_int

Number of features seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

Examples

>>> from fairpy.dataset import Adult
>>> from fairpy.model import EqOddsCalib
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> EqOddsCalib.fit(pred=pred, y=split_data.y_train, s=split_data.s_train)
>>> fair_pred = EqOddsCalib.transform(pred=pred, s=split_data.s_test)
static base_rate(label)

Percentage of samples belonging to the positive class

fit(pred: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) EqOddsCalib

Fit the model according to the given training data.

Parameters

predarray-like of shape (n_samples,)

Predictions to be calibrated, where n_samples is the number of samples.

yarray-like of shape (n_samples,)

Target vector relative to predictions.

sarray-like of shape (n_samples,)

Sensitive attributes relative to X.

Returns

self

Fitted estimator.

static fn_cost(pred, label)

Generalized false negative cost

static fnr(pred, label)

False negative rate

static fp_cost(pred, label)

Generalized false positive cost

static fpr(pred, label)

False positive rate

static tnr(pred, label)

True negative rate

static tpr(pred, label)

True positive rate

transform(pred: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]

Transform predictions to be fair.

Parameters

predarray-like of shape (n_samples,)

Predictions to be calibrated, where n_samples is the number of samples.

sarray-like of shape (n_samples,)

Sensitive attributes relative to X.

Returns

predndarray of shape (n_samples,)

Debiased predictions.

class fairpy.model.FairCstr(cstr: str = 'fair', sep_cstr: bool = False, gamma: float = 0.5, max_iter: int = 100000)

Fairness Constraints: Mechanisms for Fair Classification

Currently do not have the functionality ‘sep_constraint’ in the original implementation.

Reference:

http://proceedings.mlr.press/v54/zafar17a/zafar17a.pdf

Code adopted from:

https://github.com/mbilalzafar/fair-classification

Attributes

classes_ndarray of shape (n_classes,)

A list of class labels known to the classifier.

s_classes_ndarray of shape (n_sensitive_group,)

A list of sensitive classes known to LabelBias during training.

n_features_in_int

Number of features seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

coef_ndarray of shape (1, n_features)

Coefficient of the features in the decision function.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import FairCstr
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = FairCstr()
>>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train)
>>> model.predict(split_data.X_test)
decision_function(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[float64]]

Predict confidence scores for samples.

Parameters

Xarray-like of shape (n_samples, n_features)

The data matrix for which we want to get the confidence scores.

Returns

scoresndarray of shape (n_samples,)

Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) FairCstr

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,)

Target vector relative to X.

sarray-like of shape (n_samples,)

Sensitive attributes relative to X.

Returns

self

Fitted estimator.

predict(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[Any]]

Predict class labels for samples in X.

Parameters

Xarray-like of shape (n_samples, n_features)

The data matrix for which we want to get the predictions.

Returns

y_predndarray of shape (n_samples,)

Vector containing the class labels for each sample.

class fairpy.model.FairGLM(solver: str = 'CG', fit_intercept: bool = True, max_iter: int = 100, lam: float = 0.001, tol: float = 0.0001)

“Fair Generalized Linear Models with a Convex Penalty”

Reference:

https://proceedings.mlr.press/v162/do22a/do22a.pdf

Code adopted from:

https://github.com/hyungrok-do/fair-glm-cvx

# TODO; add support to multi-class and regression

Attributes

classes_ndarray of shape (n_classes,)

A list of class labels known to the classifier.

s_classes_ndarray of shape (n_sensitive_group,)

A list of sensitive classes known to the classifier.

n_features_in_int

Number of features seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

coef_ndarray of shape (1, n_features)

Coefficient of the features in the decision function.

intercept_ndarray of shape (1,)

Intercept (a.k.a. bias) added to the decision function.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import FairGLM
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = FairGLM()
>>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train)
>>> model.predict(split_data.X_test)
decision_function(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[float64]]

Predict confidence scores for samples.

Parameters

Xarray-like of shape (n_samples, n_features)

The data matrix for which we want to get the confidence scores.

Returns

scoresndarray of shape (n_samples,)

Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]])

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,)

Target vector relative to X.

sarray-like of shape (n_samples,)

Sensitive attributes relative to X.

Returns

self

Fitted estimator.

predict(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[Any]]

Predict class labels for samples in X.

Parameters

Xarray-like of shape (n_samples, n_features)

The data matrix for which we want to get the predictions.

Returns

y_predndarray of shape (n_samples,)

Vector containing the class labels for each sample.

predict_proba(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[float64]]

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters

Xarray-like of shape (n_samples, n_features)

Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

Tarray-like of shape (n_samples, n_classes)

Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

class fairpy.model.FairRank(K: int, P: float, alpha: float)

FA*IR: A Fair Top-k Ranking Algorithm

Reference:

https://dl.acm.org/doi/pdf/10.1145/3132847.3132938

Code adopted from:

https://github.com/fair-search/fairsearch-fair-python

Attributes

s_classes_ndarray of shape (n_sensitive_group,)

A list of sensitive classes known to LabelBias during training.

Examples

>>> from fairpy.model import FairRank
>>> model = FairRank(K=5, P=0.5, alpha=0.10)
>>> scores = [0.98, 0.97, 0.85, 0.84, 0.83, 0.55]
>>> s = ["male", "male", "male", "female", "female", "female"]
>>> fair_rank = model.transform(scores=scores, s=s)
transform(scores: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) List[int]

Transform the ranking to be fair based on scores.

Parameters

scoresarray-like of shape (n_samples,)

Scores for ranking samples, where n_samples is the number of samples.

sarray-like of shape (n_samples,)

Sensitive attributes relative to scores.

Returns

rankarray-like of shape (n_samples,)

Fair rank with indexes corresponding to scores.

class fairpy.model.IFair(s_idx: int | Sequence[int] | None = None, K: int = 2, max_iter: int = 200, restarts: int = 3, epsilon: float = 0.0001, w_recon: float = 1.0, w_fair: float = 1.0)

iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making

Reference:

https://arxiv.org/pdf/1806.01059.pdf

Code adopted from:

https://github.com/plahoti-lgtm/iFair

The time complexity is O(N^2) for every optimization iteration.

Attributes

n_features_in_int

Number of features seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

s_idx_list of int

The index(es) of the sensitive attribute(s) in the data matrix.

opt_params_ndarray of shape (n_features * n_centroids + n_features)

Solved coefficient in probabilistic clustering

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import LabelBias
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = IFair()
>>> model.fit(split_data.X_train)
>>> fair_data = model.transform(split_data.X_train)
fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) IFair

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

Returns

self

Fitted estimator.

fit_transform(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]

Fit the estimator and transform data matrix to fair data.

Parameters

Xarray-like of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

Returns

Xndarray of shape (n_samples, n_features)

Debiased training vector.

transform(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]

Transform data matrix to fair data.

Parameters

Xarray-like of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

Returns

Xndarray of shape (n_samples, n_features)

Debiased training vector.

class fairpy.model.LabelBias(metric: str = 'dp', estimator: Any | None = None, max_iter: int = 100, tol: float = 0.001, lr: float = 1.0)
“Identifying and Correcting Label Bias in Machine Learning”

Adaptively learn the weights for sensitive groups by fitting the sub-estimator multiple times

Reference:

http://proceedings.mlr.press/v108/jiang20a/jiang20a.pdf

Code adopted from:

https://github.com/google-research/google-research/tree/master/label_bias

# TODO: add support for equal odds

Attributes

classes_ndarray of shape (n_classes,)

A list of class labels known to the classifier.

s_classes_ndarray of shape (n_sensitive_group,)

A list of sensitive classes known to LabelBias during training.

n_features_in_int

Number of features seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

weights_ndarray of shape (n_sample,)

Weights for training samples solved by LabelBias.

fitted_estimator_an object of the classifier

Fitted estimator.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import LabelBias
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = LabelBias()
>>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train)
>>> model.predict(split_data.X_test)
fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) LabelBias

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,)

Target vector relative to X.

sarray-like of shape (n_samples,)

Sensitive attributes relative to X.

Returns

self

Fitted estimator.

predict(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[Any]]

Predict class labels for samples in X.

Parameters

Xarray-like of shape (n_samples, n_features)

The data matrix for which we want to get the predictions.

Returns

y_predndarray of shape (n_samples,)

Vector containing the class labels for each sample.

predict_proba(X: ndarray) Any

Probability estimates. Only available if ‘predict_proba’ is implemented in estimator.

Parameters

Xarray-like of shape (n_samples, n_features)

Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

Tarray-like of shape (n_samples, n_classes)

Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

class fairpy.model.LinearFairERM(estimator: Any | None = None)

Empirical risk minimization under fairness constraints

Reference:

https://proceedings.neurips.cc/paper/2018/file/83cdcec08fbf90370fcf53bdd56604ff-Paper.pdf

Code adopted from:

https://github.com/jmikko/fair_ERM

Attributes

classes_ndarray of shape (n_classes,)

A list of class labels known to the classifier.

s_classes_ndarray of shape (n_sensitive_group,)

A list of sensitive classes known to LabelBias during training.

fitted_estimator_an object of the classifier

Fitted estimator.

n_features_in_int

Number of features seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> from fairpy.dataset import Adult
>>> from fairpy.model import LinearFairERM
>>> dataset = Adult()
>>> split_data = dataset.split()
>>> model = LinearFairERM()
>>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train)
>>> model.predict(split_data.X_test)
decision_function(X) Any

Predict confidence scores for samples.

Parameters

Xarray-like of shape (n_samples, n_features)

The data matrix for which we want to get the confidence scores.

Returns

scoresndarray of shape (n_samples,)

Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.

feat_trans(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]

Feature transformation for fairness.

Parameters

Xarray-like of shape (n_samples, n_features)

The data matrix for which we want to get the transformations.

Returns

trans_Xndarray of shape (n_samples,)

Vector containing the class labels for each sample.

fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) LinearFairERM

Fit the model according to the given training data.

Parameters

Xarray-like of shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,)

Target vector relative to X.

sarray-like of shape (n_samples,)

Sensitive attributes relative to X.

Returns

self

Fitted estimator.

predict(X) Any

Predict class labels for samples in X.

Parameters

Xarray-like of shape (n_samples, n_features)

The data matrix for which we want to get the predictions.

Returns

y_predndarray of shape (n_samples,)

Vector containing the class labels for each sample.

predict_log_proba(X) Any

Predict logarithm of probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters

Xarray-like of shape (n_samples, n_features)

Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

Tarray-like of shape (n_samples, n_classes)

Returns the logarithm of probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

predict_proba(X) Any

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters

Xarray-like of shape (n_samples, n_features)

Vector to be scored, where n_samples is the number of samples and n_features is the number of features.

Returns

Tarray-like of shape (n_samples, n_classes)

Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.