Abstract
Classification problems can be modeled either deterministically or probabilistically. These two approaches differ in the way they represent a priori information. In high dimensions, stochastic models often offer finer representations of data, highlighting the concentration phenomena at the heart of probability and statistics.
The first part of the lecture concerns the foundations of mathematical statistics, which were established a century ago by Ronald Fisher through the notions of consistent estimates, parametric models, maximum likelihood and Fisher information. Neural networks are usually trained by maximizing the likelihood of the parameters.
The second part of the lecture concerns Shannon's information theory, which quantifies the intrinsic information provided by data, through the notion of entropy. One application is data compression. Complexity theory is considered here through the search for parsimonious structures or regularities, making it possible to construct parameterizations of high-dimensional probability distributions.