fairpy.model
- class fairpy.model.DIRemover(s_idx: int | Sequence[int], repair_feat_idx: int | Sequence[int] | None = None, num_feat_idx: int | Sequence[int] | None = None, repair_level: float = 1.0)
Certifying and Removing Disparate Impact
- Reference:
- Code adopted from:
# TODO: bugs on data types in numpy
Attributes
- classes_ndarray of shape (n_classes,)
A list of class labels known to the classifier.
- s_classes_ndarray of shape (n_sensitive_group,)
A list of sensitive classes known to LabelBias during training.
- n_features_in_int
Number of features seen during fit.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
- s_idx_list of int
The index(es) of the sensitive attribute(s) in the data matrix used to repair data.
- repair_feat_idx_list of int
The index(es) of the feature(s) to be repaired in the data matrix.
- num_feat_idx_list of int
The index(es) of the numerical feature(s) in the data matrix.
Examples
>>> from fairpy.dataset import Adult >>> from fairpy.model import DIRemover >>> dataset = Adult() >>> split_data = dataset.split() >>> model = DIRemover(s_idx=dataset.feat_idx.sen_idx, num_feat_idx=0) >>> new_X_train = model.fit_transform(split_data.X_train)
- fit_transform(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]
Fit DIRemover and return repaired data matrix.
Parameters
- Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
Returns
- Xarray-like of shape (n_samples, n_features)
Repaired data matrix.
- class fairpy.model.EqOddsCalib
Equality of Opportunity in Supervised Learning
- Reference:
- Code adopted from:
Attributes
- classes_ndarray of shape (n_classes,)
A list of class labels known to the classifier.
- s_classes_ndarray of shape (n_sensitive_group,)
A list of sensitive classes known to LabelBias during training.
- n_features_in_int
Number of features seen during fit.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
Examples
>>> from fairpy.dataset import Adult >>> from fairpy.model import EqOddsCalib >>> dataset = Adult() >>> split_data = dataset.split() >>> EqOddsCalib.fit(pred=pred, y=split_data.y_train, s=split_data.s_train) >>> fair_pred = EqOddsCalib.transform(pred=pred, s=split_data.s_test)
- static base_rate(label)
Percentage of samples belonging to the positive class
- fit(pred: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) EqOddsCalib
Fit the model according to the given training data.
Parameters
- predarray-like of shape (n_samples,)
Predictions to be calibrated, where n_samples is the number of samples.
- yarray-like of shape (n_samples,)
Target vector relative to predictions.
- sarray-like of shape (n_samples,)
Sensitive attributes relative to X.
Returns
- self
Fitted estimator.
- static fn_cost(pred, label)
Generalized false negative cost
- static fnr(pred, label)
False negative rate
- static fp_cost(pred, label)
Generalized false positive cost
- static fpr(pred, label)
False positive rate
- static tnr(pred, label)
True negative rate
- static tpr(pred, label)
True positive rate
- transform(pred: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]
Transform predictions to be fair.
Parameters
- predarray-like of shape (n_samples,)
Predictions to be calibrated, where n_samples is the number of samples.
- sarray-like of shape (n_samples,)
Sensitive attributes relative to X.
Returns
- predndarray of shape (n_samples,)
Debiased predictions.
- class fairpy.model.FairCstr(cstr: str = 'fair', sep_cstr: bool = False, gamma: float = 0.5, max_iter: int = 100000)
Fairness Constraints: Mechanisms for Fair Classification
Currently do not have the functionality ‘sep_constraint’ in the original implementation.
- Reference:
- Code adopted from:
Attributes
- classes_ndarray of shape (n_classes,)
A list of class labels known to the classifier.
- s_classes_ndarray of shape (n_sensitive_group,)
A list of sensitive classes known to LabelBias during training.
- n_features_in_int
Number of features seen during fit.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
- coef_ndarray of shape (1, n_features)
Coefficient of the features in the decision function.
Examples
>>> from sklearn.preprocessing import StandardScaler >>> from fairpy.dataset import Adult >>> from fairpy.model import FairCstr >>> dataset = Adult() >>> split_data = dataset.split() >>> model = FairCstr() >>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train) >>> model.predict(split_data.X_test)
- decision_function(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[float64]]
Predict confidence scores for samples.
Parameters
- Xarray-like of shape (n_samples, n_features)
The data matrix for which we want to get the confidence scores.
Returns
- scoresndarray of shape (n_samples,)
Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.
- fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) FairCstr
Fit the model according to the given training data.
Parameters
- Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
- yarray-like of shape (n_samples,)
Target vector relative to X.
- sarray-like of shape (n_samples,)
Sensitive attributes relative to X.
Returns
- self
Fitted estimator.
- predict(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[Any]]
Predict class labels for samples in X.
Parameters
- Xarray-like of shape (n_samples, n_features)
The data matrix for which we want to get the predictions.
Returns
- y_predndarray of shape (n_samples,)
Vector containing the class labels for each sample.
- class fairpy.model.FairGLM(solver: str = 'CG', fit_intercept: bool = True, max_iter: int = 100, lam: float = 0.001, tol: float = 0.0001)
“Fair Generalized Linear Models with a Convex Penalty”
- Reference:
- Code adopted from:
# TODO; add support to multi-class and regression
Attributes
- classes_ndarray of shape (n_classes,)
A list of class labels known to the classifier.
- s_classes_ndarray of shape (n_sensitive_group,)
A list of sensitive classes known to the classifier.
- n_features_in_int
Number of features seen during fit.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
- coef_ndarray of shape (1, n_features)
Coefficient of the features in the decision function.
- intercept_ndarray of shape (1,)
Intercept (a.k.a. bias) added to the decision function.
Examples
>>> from sklearn.preprocessing import StandardScaler >>> from fairpy.dataset import Adult >>> from fairpy.model import FairGLM >>> dataset = Adult() >>> split_data = dataset.split() >>> model = FairGLM() >>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train) >>> model.predict(split_data.X_test)
- decision_function(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[float64]]
Predict confidence scores for samples.
Parameters
- Xarray-like of shape (n_samples, n_features)
The data matrix for which we want to get the confidence scores.
Returns
- scoresndarray of shape (n_samples,)
Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.
- fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]])
Fit the model according to the given training data.
Parameters
- Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
- yarray-like of shape (n_samples,)
Target vector relative to X.
- sarray-like of shape (n_samples,)
Sensitive attributes relative to X.
Returns
- self
Fitted estimator.
- predict(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[Any]]
Predict class labels for samples in X.
Parameters
- Xarray-like of shape (n_samples, n_features)
The data matrix for which we want to get the predictions.
Returns
- y_predndarray of shape (n_samples,)
Vector containing the class labels for each sample.
- predict_proba(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[float64]]
Probability estimates.
The returned estimates for all classes are ordered by the label of classes.
Parameters
- Xarray-like of shape (n_samples, n_features)
Vector to be scored, where n_samples is the number of samples and n_features is the number of features.
Returns
- Tarray-like of shape (n_samples, n_classes)
Returns the probability of the sample for each class in the model, where classes are ordered as they are in
self.classes_.
- class fairpy.model.FairRank(K: int, P: float, alpha: float)
FA*IR: A Fair Top-k Ranking Algorithm
- Reference:
- Code adopted from:
Attributes
- s_classes_ndarray of shape (n_sensitive_group,)
A list of sensitive classes known to LabelBias during training.
Examples
>>> from fairpy.model import FairRank >>> model = FairRank(K=5, P=0.5, alpha=0.10) >>> scores = [0.98, 0.97, 0.85, 0.84, 0.83, 0.55] >>> s = ["male", "male", "male", "female", "female", "female"] >>> fair_rank = model.transform(scores=scores, s=s)
- transform(scores: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) List[int]
Transform the ranking to be fair based on scores.
Parameters
- scoresarray-like of shape (n_samples,)
Scores for ranking samples, where n_samples is the number of samples.
- sarray-like of shape (n_samples,)
Sensitive attributes relative to scores.
Returns
- rankarray-like of shape (n_samples,)
Fair rank with indexes corresponding to scores.
- class fairpy.model.IFair(s_idx: int | Sequence[int] | None = None, K: int = 2, max_iter: int = 200, restarts: int = 3, epsilon: float = 0.0001, w_recon: float = 1.0, w_fair: float = 1.0)
iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making
- Reference:
- Code adopted from:
The time complexity is O(N^2) for every optimization iteration.
Attributes
- n_features_in_int
Number of features seen during fit.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
- s_idx_list of int
The index(es) of the sensitive attribute(s) in the data matrix.
- opt_params_ndarray of shape (n_features * n_centroids + n_features)
Solved coefficient in probabilistic clustering
Examples
>>> from sklearn.preprocessing import StandardScaler >>> from fairpy.dataset import Adult >>> from fairpy.model import LabelBias >>> dataset = Adult() >>> split_data = dataset.split() >>> model = IFair() >>> model.fit(split_data.X_train) >>> fair_data = model.transform(split_data.X_train)
- fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) IFair
Fit the model according to the given training data.
Parameters
- Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
Returns
- self
Fitted estimator.
- fit_transform(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]
Fit the estimator and transform data matrix to fair data.
Parameters
- Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
Returns
- Xndarray of shape (n_samples, n_features)
Debiased training vector.
- transform(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]
Transform data matrix to fair data.
Parameters
- Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
Returns
- Xndarray of shape (n_samples, n_features)
Debiased training vector.
- class fairpy.model.LabelBias(metric: str = 'dp', estimator: Any | None = None, max_iter: int = 100, tol: float = 0.001, lr: float = 1.0)
- “Identifying and Correcting Label Bias in Machine Learning”
Adaptively learn the weights for sensitive groups by fitting the sub-estimator multiple times
- Reference:
- Code adopted from:
https://github.com/google-research/google-research/tree/master/label_bias
# TODO: add support for equal odds
Attributes
- classes_ndarray of shape (n_classes,)
A list of class labels known to the classifier.
- s_classes_ndarray of shape (n_sensitive_group,)
A list of sensitive classes known to LabelBias during training.
- n_features_in_int
Number of features seen during fit.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
- weights_ndarray of shape (n_sample,)
Weights for training samples solved by LabelBias.
- fitted_estimator_an object of the classifier
Fitted estimator.
Examples
>>> from sklearn.preprocessing import StandardScaler >>> from fairpy.dataset import Adult >>> from fairpy.model import LabelBias >>> dataset = Adult() >>> split_data = dataset.split() >>> model = LabelBias() >>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train) >>> model.predict(split_data.X_test)
- fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) LabelBias
Fit the model according to the given training data.
Parameters
- Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
- yarray-like of shape (n_samples,)
Target vector relative to X.
- sarray-like of shape (n_samples,)
Sensitive attributes relative to X.
Returns
- self
Fitted estimator.
- predict(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[Any]]
Predict class labels for samples in X.
Parameters
- Xarray-like of shape (n_samples, n_features)
The data matrix for which we want to get the predictions.
Returns
- y_predndarray of shape (n_samples,)
Vector containing the class labels for each sample.
- predict_proba(X: ndarray) Any
Probability estimates. Only available if ‘predict_proba’ is implemented in estimator.
Parameters
- Xarray-like of shape (n_samples, n_features)
Vector to be scored, where n_samples is the number of samples and n_features is the number of features.
Returns
- Tarray-like of shape (n_samples, n_classes)
Returns the probability of the sample for each class in the model, where classes are ordered as they are in
self.classes_.
- class fairpy.model.LinearFairERM(estimator: Any | None = None)
Empirical risk minimization under fairness constraints
- Reference:
https://proceedings.neurips.cc/paper/2018/file/83cdcec08fbf90370fcf53bdd56604ff-Paper.pdf
- Code adopted from:
Attributes
- classes_ndarray of shape (n_classes,)
A list of class labels known to the classifier.
- s_classes_ndarray of shape (n_sensitive_group,)
A list of sensitive classes known to LabelBias during training.
- fitted_estimator_an object of the classifier
Fitted estimator.
- n_features_in_int
Number of features seen during fit.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
Examples
>>> from sklearn.preprocessing import StandardScaler >>> from fairpy.dataset import Adult >>> from fairpy.model import LinearFairERM >>> dataset = Adult() >>> split_data = dataset.split() >>> model = LinearFairERM() >>> model.fit(split_data.X_train, split_data.y_train, split_data.s_train) >>> model.predict(split_data.X_test)
- decision_function(X) Any
Predict confidence scores for samples.
Parameters
- Xarray-like of shape (n_samples, n_features)
The data matrix for which we want to get the confidence scores.
Returns
- scoresndarray of shape (n_samples,)
Confidence scores per (n_samples, n_classes) combination. In the binary case, confidence score for self.classes_[1] where >0 means this class would be predicted.
- feat_trans(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) ndarray[Any, dtype[ScalarType]]
Feature transformation for fairness.
Parameters
- Xarray-like of shape (n_samples, n_features)
The data matrix for which we want to get the transformations.
Returns
- trans_Xndarray of shape (n_samples,)
Vector containing the class labels for each sample.
- fit(X: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], y: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]], s: Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]] | _SupportsArray[dtype] | Sequence[_SupportsArray[dtype]] | Sequence[Sequence[_SupportsArray[dtype]]] | Sequence[Sequence[Sequence[_SupportsArray[dtype]]]] | Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype]]]]] | bool | int | float | complex | str | bytes | Sequence[bool | int | float | complex | str | bytes] | Sequence[Sequence[bool | int | float | complex | str | bytes]] | Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]] | Sequence[Sequence[Sequence[Sequence[bool | int | float | complex | str | bytes]]]]) LinearFairERM
Fit the model according to the given training data.
Parameters
- Xarray-like of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
- yarray-like of shape (n_samples,)
Target vector relative to X.
- sarray-like of shape (n_samples,)
Sensitive attributes relative to X.
Returns
- self
Fitted estimator.
- predict(X) Any
Predict class labels for samples in X.
Parameters
- Xarray-like of shape (n_samples, n_features)
The data matrix for which we want to get the predictions.
Returns
- y_predndarray of shape (n_samples,)
Vector containing the class labels for each sample.
- predict_log_proba(X) Any
Predict logarithm of probability estimates.
The returned estimates for all classes are ordered by the label of classes.
Parameters
- Xarray-like of shape (n_samples, n_features)
Vector to be scored, where n_samples is the number of samples and n_features is the number of features.
Returns
- Tarray-like of shape (n_samples, n_classes)
Returns the logarithm of probability of the sample for each class in the model, where classes are ordered as they are in
self.classes_.
- predict_proba(X) Any
Probability estimates.
The returned estimates for all classes are ordered by the label of classes.
Parameters
- Xarray-like of shape (n_samples, n_features)
Vector to be scored, where n_samples is the number of samples and n_features is the number of features.
Returns
- Tarray-like of shape (n_samples, n_classes)
Returns the probability of the sample for each class in the model, where classes are ordered as they are in
self.classes_.