Skip to content

PierrickPochelu/word_tree_label

Repository files navigation

The main characteristic of supervised classification with a neural network is to produce predictions as vector probability with one softmax layer. To allow the neural network to learn more complex decisions and give them a better representation of the world we suggest teaching them the path in the decision tree instead. Those works are important to increase the spectrum of possibilities with the neural network in a real case. Our experiment shows better comprehension of the super-class of objects.
Experiments were held on the CIFAR10 dataset with the Keras suggested implementation, the softmax layer was replaced by a new kind of layer that return a path in the decision tree we named multi-optional-softmax. The CNN shows a better understanding of the given data samples by decreasing by 4.9% the error of the superclass but an increase of 6.6% in the inner class due to propagation of error.
Our work is generic enough to work on any classification neural network. There is strong interest in the classification of biological species because beings are naturally classified by the tree of life.

Neural network to learn paths in decision tree

Our approach recursively partitions the semantics of input space and assigns a label to final nodes. Our neural network jointly learns to extract features from the image via CNN and classify objects via a decision tree. The structure of the neural network is fixed and each gate is answered with one classic softmax layer. The neural network is derivable and can be learned end-to-end manner as usual. To use our architecture dataset need to be labeled as a tree-based structure.

Recent developments in deep learning regarding smarter results introduced decision trees as the output of neural network (rf:Yolo9000). Decision trees are a more informative manner to explain a decision better than class MLP vector-like prediction. Decision tree structure class like a hierarchy of ideas.

The classic softmax layer is already exclusive so why do we bother with an exclusive decision tree? There are many answers.

  • If you split one big question Q to a sequence of question q1 q2 q3. Give a good answer to q1 and q2 but fail to q3 give you a limited distance to the ground truth in the decision tree because you have arrived and succeed to q2. For example, detecting "cat" as "dog" is more acceptable than a "cat" with a vehicle because cats and dogs are of the same super-class "animals".
  • The dataset is enriched with hierarchical information.
  • Splitting answers allow for a better understanding of neural network decisions.
  • if the network sees a picture and detects with high confidence an animal but is uncertain what type of animal it is, we still know it is an animal.
  • As a consequence of the above bullet, the bottom of the tree can be poorly sampled because data is rare, but super-class can be correctly chosen. Our approach can be more robust to the lack of data.

My contribution contains :

  • Keras implementation of a new block of layers I called "multi-optional-softmax". It unifies the code of exclusive and inclusive nodes.
  • A new way to save labels that describe a path on the decision tree. I introduce the storage of arbitrary value -1 to disable backpropagation through softmax layers.
  • Experiments of our multi-optional-softmax summarized at the end of this page

Keras implementation could not do this properly that's why I create the function "multi-optional-softmax(W)" which returns a "weighted-optional-multi-softmax" layer.

Decision tree implementation

Our innovative neural network is implemented by changing both the softmax layer and the structure of labels. The classic softmax is replaced by a new kind of layer called multi-softmax-layer which predict the path in the decision tree returning the probability for each node. The structure of labels is not one class but a path list of good directions in the decision tree describing the correct path.

A decision tree can contains 2 kinds of nodes: exclusive gate (one answer among N) and parallel gate (all N sub-questions are asked).

Parallel gate

A neural network can answer some questions at the same time. The neural network takes all paths from one inclusive node ("+" symbol below).

In this example, we answer two questions independently. Is a given point to the west or east? Is the point to the north or south?

The multi-optional-softmax contains 2 softmax.

Labels to compute loss and run the back-propagation process are as follows:

Class name optional-softmax1 label optional-softmax2 label
south-west Psouth=1;Pnorth=0 Pwest=1;Peast=0
south-east Psouth=1;Pnorth=0 Pwest=0;Peast=1
north-west Psouth=0;Pnorth=1 Pwest=1;Peast=0
north-east Psouth=0;Pnorth=1 Pwest=0;Peast=1

Exclusive gate

A neural network can answer a succession of questions. The neural network answer a question by taking one path from one exclusive node ("X" symbol below). For example, we can answer: Is the point to the west or east? If it is in the west, is it in the south or north?

In our multi-one-hot-vector exclusive gates are coded as classic softmax layers. The decision taken led to the next question and the other way is ignored.

Some labels have a special value "-1" to disable backpropagation through those ignored softmax layers.

So the point : (-0.33;0.44) have label [(1;0);(0;1)] meaning "south-west" The point : (0.92;-0.15) has the label [(0;1);(-1;-1)] meaning the point is to the East, so know South/North softmax is disabled with "-1".

Labels are as follows:

Class name optional-softmax1 label optional-softmax2 label
west-south Pwest=1;Peast=0 Psouth=1;Porth=0
west-north Pwest=1;Peast=0 Psouth=0;Pnorth=1
east Pwest=0;Peast=1 Psouth=-1;Pnorth=-1

Experiments on CIFAR10

We experiment with deep learning on CIFAR10 with bother softmax layers and our multi-optional-softmax.

To experiment with our contributions we split the famous CIFAR10 dataset into 2 super-classes: animals and vehicles.

We illustrate below the corresponding decision tree

Our multi-optional-softmax is coded as follows:

  • optional-softmax1: Panimal;Pvehicle
  • optional-softmax2: Pbird; Pcat; Pdeer; Pdog; Pfrog; Phorse;
  • optional-softmax3: Pair; Pcar; Pship; Ptruck

To code label as seen in section "Decision Tree implementation" when the animal is cat optional-softmax3 is disabled with -1 values. optional-softmax1 label is Panimal=1;Pvehicle=0. optional-softmax2 label contains Pcat=1 and other probabilities=0.

Experiments

Here are the results of classic softmax and our multi-optional-softmax implementation.

After 25 epochs

animals or vehicles ? CIFAR10
softmax
2 output
92.11% -
softmax
10 output
92.63% * 66.69%
multi optional softmax 93.30% 65.86%
* To classify "animals or vehicles" with 10 output softmax we look if the class predicted belongs to the animal or vehicle super-class.


After 50 epochs

animals or vehicles ? CIFAR10
softmax
2 output
92.67% -
softmax
10 output
93.15% * 68.33%
multi optional softmax 93.47% 66.11%

We can observe that super-classes are better described when we add their sub-classes information in the learning process.

More complex decision logic is possible, like "at least N path among M with N<M", but not possible apriori with softmax layer build to choose one decision to each stage of the decision tree.

References

Alex Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009. https://www.cs.toronto.edu/~kriz/cifar.html

Joseph Redmon an Ali Farhadi, YOLO9000: Better, Faster, Stronger, CoRR journal abs/1612.08242, 2016, http://arxiv.org/abs/1612.08242

Yongxin Yang and Irene Garcia Morillo and Timothy M. Hospedales, Deep Neural Decision Trees , CoRR journal abs/1806.06988, 2018, http://arxiv.org/abs/1806.06988

Yani Ioannou and Duncan P. Robertson and Darko Zikic and Peter Kontschieder and Jamie Shotton and Matthew Brown and Antonio Criminisi, Decision Forests, Convolutional Networks and the Models in-Between , CoRR journal abs/1603.01250, 2016, http://arxiv.org/abs/1603.01250

Releases

No releases published

Packages

No packages published

Languages