Skip to content

ghattab/secondarydata

Repository files navigation

Secondary data

Bookbased mushroom data set, describing physical characteristics, binary classification in poisonous or edible. Data Set Description Files are reported for two data sets: Primary and Secondary. The primary data set encodes the textbook mushroom entries, while the secondary data set is a pilot data, result of simulation. The secondary data may be used to achieve a binary classification using the Random Forests classifier (Accuracy and F2 score of 1).

Source

Created by Dennis Wagner, Dominik Heider, Georges Hattab Based on Patrick Hardin. Mushrooms & Toadstools. Collins, 2012 Inspired by Jeff Schlimmer. Mushroom Data Set. 1987. URL: https://archive.ics.uci.edu/ml/datasets/Mushroom

License

All source code and pertaining data available on this site is open-source, freely available for modification and remixing under the Creative Commons License CC BY 4.0.

Data Set Information

The primary data set contains descriptions of 173 mushroom species as entries. It can be used to simulate hypothetical mushrooms.

The secondary data set is a product of such simulation and contains 61,069 hypothetical mushrooms. It can be used for binary classification.