Salle 5, Site Marcelin Berthelot
Open to all
-

Abstract

Since their introduction in 2017, Transformers have profoundly transformed large language models and deep learning more generally. This success is largely based on the so-called mechanism of "self-attention". In this talk, I will present a mathematical framework for viewing self-attention as a system of interacting particles. I will explain some remarkable properties of the associated dynamics in the space of probability measures, with particular emphasis on cluster formation, Gaussian preservation, the subtleties of the associated mean-field limit, and the high "expressivity" of these neural networks.

(Work stemming from several collaborations: two first articles written with Borjan Geshkovski, Yury Polyanskiy and Philippe Rigollet, then a article with Andrei Agrachev, and finally a recent article with Léa Bohbot, Gabriel Peyré and François-Xavier Vialard)