Skip to content

ds-and-ml-projects/costa-rican-household-poverty-level

Repository files navigation

Information

Title: Costa Rican Household Poverty Level Prediction

Website: https://www.kaggle.com/c/costa-rican-household-poverty-prediction

Purpose:

  • Here's the backstory: Many social programs have a hard time making sure the right people are given enough aid. It’s especially tricky when a program focuses on the poorest segment of the population. The world’s poorest typically can’t provide the necessary income and expense records to prove that they qualify.
  • In Latin America, one popular method uses an algorithm to verify income qualification. It’s called the Proxy Means Test (or PMT). With PMT, agencies use a model that considers a family’s observable household attributes like the material of their walls and ceiling, or the assets found in the home to classify them and predict their level of need.

Scoring Metric: Accuracy

  • Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions our model got right. Formally, accuracy has the following definition:
  • Formula: (TP + TN)/(TP + FP + TN + FN)
    • TP: True Positives
    • FP: False Positives
    • TN: True Negatives
    • FN: False Negatives

MyBinder Jupyter Interactive Notebooks:

  1. 00_DateNotebook.ipynb: Provides an overview of the the finding/insights during the project. Binder
  2. 01_DateNotebook.ipynb: An Exploratory Analysis on the dataset. Binder
  • 02_DateNotebook.ipynb: A deeper analysis and visualization understanding on the dataset. Binder
  • 03_DateNotebook.ipynb: Implemented different models like Linear SVC, GaussianNB, MLPClassifier, LogisticRegressionCV, RidgeClassifierCV, LinearDiscriminantAnalysis, KNeigbhorsClassifier, ExtraTreeClassifier, and lgb. Binder
  • 04_DateNotebook.ipynb: Implemented FeatureTools! Binder