Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using Boston dataset in tests and examples #494

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

StrikerRUS
Copy link
Member

@StrikerRUS StrikerRUS commented Jan 26, 2022

Boston dataset will be removed from scikit-learn in next version due to ethical reasons. Details: scikit-learn/scikit-learn#16155.

  /usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function load_boston is deprecated; `load_boston` is deprecated in 1.0 and will be removed in 1.2.
  
      The Boston housing prices dataset has an ethical problem. You can refer to
      the documentation of this function for further details.
  
      The scikit-learn maintainers therefore strongly discourage the use of this
      dataset unless the purpose of the code is to study and educate about
      ethical issues in data science and machine learning.
  
      In this special case, you can fetch the dataset from the original
      source::
  
          import pandas as pd
          import numpy as np
  
  
          data_url = "http://lib.stat.cmu.edu/datasets/boston"
          raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
          data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
          target = raw_df.values[1::2, 2]
  
      Alternative datasets include the California housing dataset (i.e.
      :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
      dataset. You can load the datasets as follows::
  
          from sklearn.datasets import fetch_california_housing
          housing = fetch_california_housing()
  
      for the California housing dataset and::
  
          from sklearn.datasets import fetch_openml
          housing = fetch_openml(name="house_prices", as_frame=True)
  
      for the Ames housing dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant