- Dataset must have the last column be the target column.
- Specify this target column in tree.py's main function as "header = []" and enter the column names in the list
- It converts any datapoint into a string because at this moment, the actual numbers are not relevant nor necessary.
- clone repo
- install python
- in tree.py's main -> specify file for training dataset, testing data, and headers
- include build_tree for the tree, print if needed.
- include test function for testing or just run classify on the new row
This tree will work on any pretty much dataset. Add as many feature columns as needed - the gini impurity and information gain will be calculated recursively on all of them.
Calculates Gini Impurity for all viable partitions
Based on Information Gain -> decides on order of Question
- We may use different data in the future. The Data Generation file allows us to create a large dataset by just inputting values as ranges. The generation file will then create a dataset by selecting a random value from each range as many times as you like!!! A Decision Tree is the only way to do this well
We have been given every possibility and they are represented by the range values. Key points:
From the data se can see that users are placed into a range for example '2-3', we never see plants recommended as either only 2 or only for 3 - they are not unique values in the dataset. Data also never overlaps any of the ranges. Thus we can maintain this range.