information-theory

[Updated on 2019-05-27: add the section on Lottery Ticket Hypothesis.] If you are like me, entering into the field of deep learning with experience in traditional machine learning, you may often ponder over this question: Since a typical deep neural network has so many parameters and training error can easily be perfect, it should surely suffer from substantial overfitting. How could it be ever generalized to out-of-sample data points? The effort in understanding why deep neural networks can generalize somehow reminds me of this interesting paper on System Biology — “Can a biologist fix a radio?...

information-theory

Are Deep Neural Networks Dramatically Overfitted?

Anatomize Deep Learning with Information Theory