Neural network to learn paths in decision tree

The main characteristic of supervised classification with a neural network is to produce predictions as vector probability with one softmax layer. To allow the neural network to learn more complex decisions and give them a better representation of the world we suggest teaching them the path in the decision tree instead. Those works are important to increase the spectrum of possibilities with the neural network in a real case. Our experiment shows better comprehension of the super-class of objects.
Experiments were held on the CIFAR10 dataset with the Keras suggested implementation, the softmax layer was replaced by a new kind of layer that return a path in the decision tree we named multi-optional-softmax. The CNN shows a better understanding of the given data samples by decreasing by 4.9% the error of the superclass but an increase of 6.6% in the inner class due to propagation of error.
Our work is generic enough to work on any classification neural network. There is strong interest in the classification of biological species because beings are naturally classified by the tree of life.

Neural network to learn paths in decision tree

Our approach recursively partitions the semantics of input space and assigns a label to final nodes. Our neural network jointly learns to extract features from the image via CNN and classify objects via a decision tree. The structure of the neural network is fixed and each gate is answered with one classic softmax layer. The neural network is derivable and can be learned end-to-end manner as usual. To use our architecture dataset need to be labeled as a tree-based structure.

Recent developments in deep learning regarding smarter results introduced decision trees as the output of neural network (rf:Yolo9000). Decision trees are a more informative manner to explain a decision better than class MLP vector-like prediction. Decision tree structure class like a hierarchy of ideas.

The classic softmax layer is already exclusive so why do we bother with an exclusive decision tree? There are many answers.

If you split one big question Q to a sequence of question q₁ q₂ q₃. Give a good answer to q₁ and q₂ but fail to q₃ give you a limited distance to the ground truth in the decision tree because you have arrived and succeed to q₂. For example, detecting "cat" as "dog" is more acceptable than a "cat" with a vehicle because cats and dogs are of the same super-class "animals".
The dataset is enriched with hierarchical information.
Splitting answers allow for a better understanding of neural network decisions.
if the network sees a picture and detects with high confidence an animal but is uncertain what type of animal it is, we still know it is an animal.
As a consequence of the above bullet, the bottom of the tree can be poorly sampled because data is rare, but super-class can be correctly chosen. Our approach can be more robust to the lack of data.

My contribution contains :

Keras implementation of a new block of layers I called "multi-optional-softmax". It unifies the code of exclusive and inclusive nodes.
A new way to save labels that describe a path on the decision tree. I introduce the storage of arbitrary value -1 to disable backpropagation through softmax layers.
Experiments of our multi-optional-softmax summarized at the end of this page

Keras implementation could not do this properly that's why I create the function "multi-optional-softmax(W)" which returns a "weighted-optional-multi-softmax" layer.

Decision tree implementation

Our innovative neural network is implemented by changing both the softmax layer and the structure of labels. The classic softmax is replaced by a new kind of layer called multi-softmax-layer which predict the path in the decision tree returning the probability for each node. The structure of labels is not one class but a path list of good directions in the decision tree describing the correct path.

A decision tree can contains 2 kinds of nodes: exclusive gate (one answer among N) and parallel gate (all N sub-questions are asked).

Parallel gate

A neural network can answer some questions at the same time. The neural network takes all paths from one inclusive node ("+" symbol below).

In this example, we answer two questions independently. Is a given point to the west or east? Is the point to the north or south?

The multi-optional-softmax contains 2 softmax.

Labels to compute loss and run the back-propagation process are as follows:

Class name	optional-softmax1 label	optional-softmax2 label
south-west	P_south=1;P_north=0	P_west=1;P_east=0
south-east	P_south=1;P_north=0	P_west=0;P_east=1
north-west	P_south=0;P_north=1	P_west=1;P_east=0
north-east	P_south=0;P_north=1	P_west=0;P_east=1

Exclusive gate

A neural network can answer a succession of questions. The neural network answer a question by taking one path from one exclusive node ("X" symbol below). For example, we can answer: Is the point to the west or east? If it is in the west, is it in the south or north?

In our multi-one-hot-vector exclusive gates are coded as classic softmax layers. The decision taken led to the next question and the other way is ignored.

Some labels have a special value "-1" to disable backpropagation through those ignored softmax layers.

So the point : (-0.33;0.44) have label [(1;0);(0;1)] meaning "south-west" The point : (0.92;-0.15) has the label [(0;1);(-1;-1)] meaning the point is to the East, so know South/North softmax is disabled with "-1".

Labels are as follows:

Class name	optional-softmax1 label	optional-softmax2 label
west-south	P_west=1;P_east=0	P_south=1;P_orth=0
west-north	P_west=1;P_east=0	P_south=0;P_north=1
east	P_west=0;P_east=1	P_south=-1;P_north=-1

Experiments on CIFAR10

We experiment with deep learning on CIFAR10 with bother softmax layers and our multi-optional-softmax.

To experiment with our contributions we split the famous CIFAR10 dataset into 2 super-classes: animals and vehicles.

We illustrate below the corresponding decision tree

Our multi-optional-softmax is coded as follows:

optional-softmax1: P_animal;P_vehicle
optional-softmax2: P_bird; P_cat; P_deer; P_dog; P_frog; P_horse;
optional-softmax3: P_air; P_car; P_ship; P_truck

To code label as seen in section "Decision Tree implementation" when the animal is cat optional-softmax3 is disabled with -1 values. optional-softmax1 label is P_animal=1;P_vehicle=0. optional-softmax2 label contains P_cat=1 and other probabilities=0.

Experiments

Here are the results of classic softmax and our multi-optional-softmax implementation.

After 25 epochs

	animals or vehicles ?	CIFAR10
softmax 2 output	92.11%	-
softmax 10 output	92.63% *	66.69%
multi optional softmax	93.30%	65.86%

* To classify "animals or vehicles" with 10 output softmax we look if the class predicted belongs to the animal or vehicle super-class.

After 50 epochs

	animals or vehicles ?	CIFAR10
softmax 2 output	92.67%	-
softmax 10 output	93.15% *	68.33%
multi optional softmax	93.47%	66.11%

We can observe that super-classes are better described when we add their sub-classes information in the learning process.

More complex decision logic is possible, like "at least N path among M with N<M", but not possible apriori with softmax layer build to choose one decision to each stage of the decision tree.

References

Alex Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009. https://www.cs.toronto.edu/~kriz/cifar.html

Joseph Redmon an Ali Farhadi, YOLO9000: Better, Faster, Stronger, CoRR journal abs/1612.08242, 2016, http://arxiv.org/abs/1612.08242

Yongxin Yang and Irene Garcia Morillo and Timothy M. Hospedales, Deep Neural Decision Trees , CoRR journal abs/1806.06988, 2018, http://arxiv.org/abs/1806.06988

Yani Ioannou and Duncan P. Robertson and Darko Zikic and Peter Kontschieder and Jamie Shotton and Matthew Brown and Antonio Criminisi, Decision Forests, Convolutional Networks and the Models in-Between , CoRR journal abs/1603.01250, 2016, http://arxiv.org/abs/1603.01250

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
AND2.jpg		AND2.jpg
README.md		README.md
SC2.png		SC2.png
XOR.jpg		XOR.jpg
cifar10_XOR.jpg		cifar10_XOR.jpg
cifar10_dataset.PNG		cifar10_dataset.PNG
get_cifar10.py		get_cifar10.py
get_model.py		get_model.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AND2.jpg

AND2.jpg

README.md

README.md

SC2.png

SC2.png

XOR.jpg

XOR.jpg

cifar10_XOR.jpg

cifar10_XOR.jpg

cifar10_dataset.PNG

cifar10_dataset.PNG

get_cifar10.py

get_cifar10.py

get_model.py

get_model.py

main.py

main.py

Repository files navigation

Neural network to learn paths in decision tree

Decision tree implementation

Parallel gate

Exclusive gate

Experiments on CIFAR10

Experiments

References

About

Releases

Packages

Languages

PierrickPochelu/word_tree_label

Folders and files

Latest commit

History

Repository files navigation

Neural network to learn paths in decision tree

Decision tree implementation

Parallel gate

Exclusive gate

Experiments on CIFAR10

Experiments

References

About

Topics

Resources

Stars

Watchers

Forks

Languages