Abstract
Visual scene understanding requires much more than a list of the objects present in the scene and their locations. To understanding a scene, plan action on it, and predict what will happen next we must extract the relationships between objects (e.g., support and attachment), their physical properties (e.g., mass and material), and the forces acting upon them. One view is that we do this with the use of a "mental physics engine" that represents this information and runs forward simulations to predict what will happen next. Over the last several years we have been testing this idea with Josh Tenenbaum using fMRI. I will review evidence that certain brain regions in the parietal and frontal lobes (but not the ventral visual pathway) behave as expected if they implement a mental physics engine: they respond more strongly when deciding about physical than visual properties and when viewing physical versus social stimuli (Fischer et al, 2016), and they contain scenario-invariant information about object mass inferred from motion trajectories (Schwettmann et al, 2019), the stability of a configuration of objects (Pramod et al, 2022), and whether two objects are in contact with each other (Pramod et al 2025). Most tellingly, we can decode predicted collision events from perceived collision events, as expected if these brain regions run forward simulations of what will happen next. I will discuss the scope of engagement of this system by not just rigid "Things" but fluid "Stuff" (Paulun et al 2025), and (at least under some circumstances) by language. I will argue that these findings (as well as the poor performance of deep net models on many intuitive physical tasks) provide preliminary evidence for a physics engine in the human parietal and frontal cortex.