What's hidden in an overparameterized neural network with random weights? If the distribution is properly scaled (e.g. Kaiming Normal), then it contains a subnetwork which achieves high accuracy without ever modifying the values of the weights... (/n)
very interesting, but also not so interesting bc (1) isn't finding a subset of a net eqiv. (almost) to training the net? (2) you sample more, you increase your chance.