Debias-In-Machine-Learning

Mitigate machine learning bias to ensure data ethics in U.S. national home mortgage dataset.
📝 Note: This document is still under writting.

Overview

Goal of the project

The project is related to the overall area of 'machine bias'. It uses the US national mortage dataset and
1. explore the machine bias (discrimination) as loan approvals benefits one group of people over another based on certain social attributes (legally known as protected classes such as race, gender, and religion). Specified three catrogies [Gender, Ethinicity and Race] by using the mean-difference method.
2. mitigating discrmination by implementing different methods (pre-processing, post-processing, naive-fairness etc.) and using machine learning algorithm (Prediction tree, random forest and logistic regression).
At the end, it aims to train models which give best performance in both accuracy (utility) and transparency (fairness) which ensures the algorithms are categorically obejctive and diminish the social disparities.

This project includes the following files:

clean.ipynb includes the code to clean the data
bias_indentification.ipynb contains the code to identify machine-bias in data
de-biasing.py contains the code to mitigate the machine-bias
docs/final_presentations.ppt presents the slides deck
README.md summarzies and introduces the project

Dependencies and libraries:

Colaboratory is used to develop this project.
PyDrive is used to import data from Google drive into Colaboratory.
themis-ML is an open source Python library for speicifing, implementing and evluating the machine bias. (Official documentation for this package can be found here)
Pandas, Numpy is used in data cleaning.

I.Business and Data questions

Background

Sha Sundaram, a privacy engineer at Snap who focuses on bias in machine learning, said engineers must put themselves in the shoes of their users and try to think like them. She noted that biases in machine learning have the potential to harm users, but it's very difficult to identify those biases.

She shared a checklist she uses to help identify bias in machine learning. What training data is used? What is being put in place to improve data quality? How sensitive is a model's accuracy to changes in test datasets? What is the risk to the user if something gets mislabeled? In what scenarios can your model be applied? When should a model be retrained?

References

You can find a complete set of references for the discrimination discovery and fairness-aware methods implemented in themis-ml in this paper.

Dataset

HMDA (Home Mortgage Dataset) Data generated by HMDA provides information on lending practices. This data set includes multiple files; the primary table is the Loan Application Register (LAR), which contains:

demographic information about loan applicants, including race, gender and income; the purpose of the loan (i.e. home purchase or improvement);
whether the buyer intends to live in the home; the type of loan (i.e. conventional, FHA insured, etc.);
the outcome of the loan application (i.e. approved or declined).
geographical information on applicants, such as Census tract, MA (metropolitan area), state and county, total population and percentage of minority population by Census tract.

A 1% sample CSV showed.

II.Data Preparation

The section contains three parts:

Feature selection
Attributes transformation on: - Target variable - Protected attributes
Null value Elimination

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
HMDA_sample.csv		HMDA_sample.csv
HomeMortgage.ipynb		HomeMortgage.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HMDA_sample.csv

HMDA_sample.csv

HomeMortgage.ipynb

HomeMortgage.ipynb

README.md

README.md

Repository files navigation

Debias-In-Machine-Learning

Overview

Goal of the project

This project includes the following files:

Dependencies and libraries:

I.Business and Data questions

Background

References

Dataset

II.Data Preparation

III.Debias Implementation

IV. Results and Discussion

External Links:

About

Releases

Packages

Languages

Yuexi-Li/debias-in-machine-learning

Folders and files

Latest commit

History

Repository files navigation

Debias-In-Machine-Learning

Overview

Goal of the project

This project includes the following files:

Dependencies and libraries:

I.Business and Data questions

Background

References

Dataset

II.Data Preparation

III.Debias Implementation

IV. Results and Discussion

External Links:

About

Topics

Resources

Stars

Watchers

Forks

Languages