forked from barisesmer/C4.5
/
README
38 lines (28 loc) · 1.25 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Implementation of c4.5
Data format needed :
names file :
classes should be mentioned in first line.
attributes should be mentioned one per line, with type of data separated with a colon, eg.
attrbute : (d1,d2,d3)
OR
attribute : continuous
data file :
class should be the first OR last attribute.
How to run :
> python main.py [data-file-name] [names-file-name]
eg.
> python main.py "../data/house-votes-84/house-votes-84.data" "../data/house-votes-84/house-votes-84.names"
Expected Output :
sepal length <= 2.45 :
sepal width <= 6.8 :
petal width <= 4.15 :
petal length <= 7.8 : Iris-setosa
petal length > 7.8 : Iris-virginica
petal width > 4.15 : Iris-setosa
sepal width > 6.8 : Iris-virginica
sepal length > 2.45 : Iris-virginica
The code creates the tree based on 80% of random rows selected from data. (training)
The testing is done on the remaining 20% of the data by a function call on line 16 of main.py
K Cross Validation has been implemented, it is called fromline 18 of main.py
(at 3:38 AM 19 Feb, Testing only works for discrete data. Figuring out how to test for continuous data.)
(UPDATE 4:26 AM 19 Feb, Testing works on continuous data as well.)