Amphithéâtre Guillaume Budé, Site Marcelin Berthelot
En libre accès, dans la limite des places disponibles
-

Résumé

Two basic forms of data analysis are classification and retrieval: given an image, for instance, identify what it depicts, or find other similar images. Problems like these have both a statistical and an algorithmic component. The statistical view focuses on how many data points are needed in order for the results to be meaningful. The algorithmic view focuses on how efficiently retrieval can be performed. Both, it turns out, are subject to a bad "curse of dimension”, that is, they scale poorly with the number of features in the data.

We will talk about a generic technique for overcoming both the statistical and algorithmic curse of dimension.

The starting point is that a lot of data that are apparently high-dimensional, in the sense of having some large number of features D, in fact have much lower “intrinsic dimension” d < D: they lie close to a d-dimensional manifold, for instance. We will formalize this kind of structure, and describe how randomized geometric procedures can be used to automatically adapt to it. This immediately yields methods for classification, regression, and nearest neighbor search whose dependence on dimension is reduced from D to d.

Intervenants

Sandoy Dasgupta

UCSD