Measurement transport for Transformer analysis

Abstract

Since their introduction in 2017, Transformers have profoundly transformed large language models and deep learning more generally. This success is largely based on the so-called mechanism of "self-attention". In this talk, I will present a mathematical framework for viewing self-attention as a system of interacting particles. I will explain some remarkable properties of the associated dynamics in the space of probability measures, with particular emphasis on cluster formation, Gaussian preservation, the subtleties of the associated mean-field limit, and the high "expressivity" of these neural networks.

(Work stemming from several collaborations: two first articles written with Borjan Geshkovski, Yury Polyanskiy and Philippe Rigollet, then a article with Andrei Agrachev, and finally a recent article with Léa Bohbot, Gabriel Peyré and François-Xavier Vialard)

Speaker(s)

11:15am - 12:30pm

Cyril Letrouit

Measurement transport for Transformer analysis

Abstract

Speaker(s)

Cyril Letrouit

Events

Two remarks on P

(Semi) relativistic (semi) classical PDE (around) the self-consistent Pauli equation

Some mathematical models for controlling mosquito and agricultural pest populations

Existence and Uniqueness of the Keller Segel particle system law and excursion decomposition…

About fractional diffusion limits for linear kinetic equations

Asymptotic modeling and inverse problems in population dynamics: from discrete mechanisms to c…

Measurement transport for Transformer analysis

See also