The Anatomy of Small Neural Networks: Parameter Space Structure, Architecture Selection, and the Depth-Width Tradeoff
We present a systematic empirical investigation of feedforward ReLU networks at minimal scale (1-16 hidden neurons), revealing structural properties of parameter space, training dynamics, and architecture selection that are obscured at practical scale. Through 11 controlled experiments totaling over 10,000 training runs, we establish several novel findings: (1) the Hessian eigenspectrum of a trained network decomposes into exactly four tiers — steep, moderate, weak, and zero — whose counts are predictable from architecture alone, with the steep tier recovering the target function's degrees of freedom; (2) this structure emerges during training via a sharp phase transition (27 steep eigenvalues collapse to 4 between steps 50-100 for a 16-neuron network), after which 97% of training time is spent exploring the solution manifold rather than improving the function; (3) exhau