Skip to content

wmpauli/knowledge_distillation

Repository files navigation

Knowledge distillation with Keras

Keras implementation of Hinton's knowledge distillation (KD), a way of transferring knowledge from a large model into a smaller model.

Dataset

Caltech-256 dataset for a demonstration of the technique.

  • I resize all images to 299x299.
  • For validation I use 20 images from each category.
  • For training I use 100 images from each category.
  • I use random crops and color augmentation to balance the dataset.

CNN Architectures

Teacher: Xception Student: Squeezenet

Tutorial

Please complete the tutorial, if you want to learn how to use knowledge distillation to train a student model with unlabeled data.

Prerequisites and Requirements

Skills

  • Azure Subscription ID. Make sure you have an Azure subscription ID that allows you to create compute targets (compare VM sizes).
  • Knowledge of basic data science and machine learning concepts. Here and here you'll find short introductory material.
  • Moderate skills in coding with Python and machine learning using Python. A good place to start is here.
  • Prior experience with AML services is recommended.

Software Dependencies

References

  • Buciluǎ, Cristian, Rich Caruana, and Alexandru Niculescu-Mizil. "Model compression." Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.

  • Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).