Skip to content

srivmanu/C4.5

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementation of c4.5
Data format needed :
names file : 
classes should be mentioned in first line. 
attributes should be mentioned one per line, with type of data separated with a colon, eg.
	attrbute : (d1,d2,d3)
OR	
	attribute : continuous

data file : 
class should be the first OR last attribute.

How to run : 

> python main.py [data-file-name] [names-file-name]

eg. 
> python main.py "../data/house-votes-84/house-votes-84.data" "../data/house-votes-84/house-votes-84.names"

Expected Output : 

sepal length <= 2.45 :
        sepal width <= 6.8 :
                petal width <= 4.15 :
                        petal length <= 7.8 : Iris-setosa
                        petal length > 7.8 : Iris-virginica
                petal width > 4.15 : Iris-setosa
        sepal width > 6.8 : Iris-virginica
sepal length > 2.45 : Iris-virginica

The code creates the tree based on 80% of random rows selected from data. (training)

The testing is done on the remaining 20% of the data by a function call on line 16 of main.py 

K Cross Validation has been implemented, it is called fromline 18 of main.py

(at 3:38 AM 19 Feb, Testing only works for discrete data. Figuring out how to test for continuous data.)
(UPDATE 4:26 AM 19 Feb, Testing works on continuous data as well.)

About

A python implementation of C4.5 algorithm by R. Quinlan

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%