Is AI Physical?
Context: This is part of a series of posts I am writing with Dmitry Vaintrob, as we aim to unpack some potential value from Quantum Field Theory (QFT). Consider this post as framing why physics and its frameworks can be good for building a science of AI.
Introduction
In Position: Is machine learning good or bad for the natural sciences?, the authors posit some ontological and epistemological differences between machine learning and the natural sciences, and use these to discuss the contexts in which AI should – and should not – be used for scientific research. For ML, the data defines all of reality (the ontology), and a model is judged to truly represent that reality if it performs well (the epistemology). The article considers neural networks as largely opaque, without a principled understanding of their latent structures.
In contrast, the natural sciences (from now on: physics) are ontologically richer and epistemologically more restrictive; the framework of theories, experimental laws, and (I would add) intuitions guiding scientific practice help us better predict, understand, and judge the latent structure driving performance. This helps us design experiments, build theories based on empirics, and build intuition for incorporating new ideas into the physics canon. Ptolemy’s epicycles, for example, predict the motion of the planet’s just as well as Kepler’s laws of planetary motion, but only the Keplerian latent model maintained consistency with the physics framework as it evolved to accommodate heliocentrism and Newtonian gravity[1].
The position paper views AI as a tool to study the natural world; it’s OK to use tools with an incomplete ontology or epistemology as long as you know what you’re doing. In this post, I instead want to ask: can physics be used as a tool to study AI systems? And, if so, could this bring the epistemologies and ontologies of the two fields closer together?
The physical foundations of AI
Citing institutes like IAIFI, groups that study the physics of intelligence, the physics of language models, and others, there seems to be a growing trend to use insights from physics in this regard. These groups tend to focus on applying physics techniques or intuition in a piecemeal fashion applied to representations, computation, or emergence, instead of building a comprehensive science of AI. Deeper connections between physics and neural networks (linked, for example, by the principle of sparsity) have also been on the rise. In 2024, the Nobel Prize in physics was awarded to Hinton and Hopfield for work that made modern neural networks possible, weighing in on this connection and departing somewhat from a tradition of valuing work that aids in the discovery of new physics, rather than the invention of new tools. The Hopfield network was directly inspired by ideas in statistical mechanics, and used energy landscapes to describe associative memory by storing information as stable states on a fully-connected feed-forward network layer. Given an initial state, the network evolves deterministically to minimize the energy, which acts as a loss function. Hinton’s Boltzmann machine, sometimes called a generalized Hopfield Network, extends this work to tune the number of layers or connections between neurons, allowing the network to abstract more complicated features from the data. Importantly, Boltzmann machines also introduce stochasticity, making them the first example of a generative model.
Though divisive, the Nobel committee sent an important message with this choice: there are deep connections between physics and AI systems. At a fireside chat about the Nobel prize hosted by IAIFI, Hidenori Tanaka noted that the choice indicated that ‘the future is for AI’. Later, Di Luo added that it further indicated that ‘the future is for physics.’
Is ‘Physical’ more than Physics?
How far-reaching are these connections? Could physics’ insights be fundamental to AI systems, or are they convenient metaphors and mathematical happenstance? In previous posts, I wrote about the need to build up the foundational science of AI to ensure its safety. Retooling the methods and structure of physical systems for use in AI is appealing for this reason. However, physics is a scientific culture, complete with its own standards for scientific practice, idiosyncrasies, and historical baggage (something that AI currently lacks). Care should be taken, then, in thinking about how physicists solve problems and the impact this could have on the culture of scientific practice for AI.
On paper, something is ‘physical’ if it obeys the laws of physics. How these laws become established is a matter of some historical and philosophic import that I won’t get into here, but in general there are two approaches to physics research:
Empirics first (Bottom-up): This approach abstracts patterns and regularities from observations and measurements, with the goal of building up theories and laws that can be used to predict, explain, or discover phenomena of interest.
Ab Initio (top-down): This approach starts with first principles to derive new theories from foundational assumptions and physical constraints. It depends on an existing framework of theory and experiment.
Physics contains tight feedback loops between theory and experiment, and fields often oscillate between bottom-up and top-down phases. How this happens depends on many factors, including research taste and technological advancements, which have the tendency to change throughout history. AI is currently in an empirical phase of research, evidenced by the discovery of neural scaling laws and mounting efforts to abstract systematic insights using mechanistic interpretability techniques. While there is some feedback with theory (mainly toy models), we have yet to build up a theory-practice framework that would support a first-principles approach.
When building theories or interpreting experiments, there are a number of epistemic virtues that operationalize what it means for something to be counted as ‘physical’. These include:
Causality: Physics should understand the relationship between cause and effect. This can be used to form a mechanistic description for the motion of an object subject to external forces, but can also put fundamental bounds on causal relationships. For example, information can’t move faster than the speed of light, so everything in the universe is restricted to its own cone of causal influence.
Universality: The same quantitative theory can describe qualitatively different systems. The Ising model, for example, can be used to describe magnetism and the phase transition of water. This has implications on the renormalization group and scale that I hope to touch on in a later post.
Consistency: Physics should be locally logically coherent and scale as expected. String theory should reduce to general relativity in the right classical limit, and general relativity should reduce to Newtonian gravity in flat space.
Physics Could be Good for AI… But we should be wary of overextending its power
There are obvious benefits for applying the above criteria to AI systems. Scaling laws, for example, predict how model performance scales with size, data, or compute, but it is not well understood which of these laws are universal. Moreover, we lack a causal or consistent understanding of how emergent behaviors arise at specific scales.
There are some reasons to be cautious in our attempts to build a comprehensive physics of AI. For one, physicists (namely theorists) have a tendency to over-value simplicity (i.e. by seeing everything as a Gaussian). In the messy world of web data and hidden engineering details, simplified assumptions may not work as well as we want them to. Second, maybe the epistemic virtues of physics don’t broadly apply to the ‘alien’ AI universe, meaning that physics techniques may only be useful in ad-hoc applications. Similarly, though computational systems are physical in the sense that they were designed and built using physics, in practice they may be more aligned with computer science heuristics than physical laws. But, in practice, are the aspects of the things that work really theoretical CS (math) in disguise? Do they contain anything ‘physical’?
Nevertheless, physics is notoriously good at spanning scales[2] and studying complex systems in a reductionist (what equations are governing the motion of each individual particle?) as well as an emergent sense (what are the large scale properties of the particle system as whole?). Physical systems often have multiple or competing scales, and physicists are generally good at finding the ‘right’ (natural, physical, dimensionless) parameters that can be expanded or scaled to reach different physical scaling limits that can be used to sanity check empirical results and bridge theories of different computational regimes. From my perspective, AI doesn’t have enough principled scaling limits. In a different way, physics can also put theoretical limits on what it is possible to predict about intelligent systems, similar to no-go theorems in physics. For example, we are unable to enumerate all the potential ways that an AI agent could interact with the world because of sensitivity to initial conditions. This last point has a particular impact on AI safety arguments about the trustworthiness and control of AI systems.
If physics is all about understanding the latent structure of AI systems: is physics interpretability? Indeed, a lot of the virtues of physics echo those of a ‘good’ model of interpretability laid out in Dmitry Vaintrob’s recent post. To recap:
>‘There is no such thing as interpreting a neural network. There is only interpreting a neural network at a given scale of precision.’
For the reasons mentioned above, physics is likely good at providing insights that are not too coarse to be useless, nor too fine to be overwhelming. Hopefully, it can be used to build intuition for when details can be abstracted away and when they matter. In the case that there is more than one mechanistic description, a physics framework can help us make judgements about which theories are valid and in which regimes. To Dmitry’s koan, I would add that interpretability is more subjective in practice because it relies on a quantitative understanding of qualitative (human interpretable) features. We can sometimes home-in on a mechanistic description of neural network features for a particular problem set-up, but these insights do not universally or robustly apply. For this, maybe physics can provide a razor between questions that have scientific answers and those that do not.
Physics as a Way of Thinking
These thoughts reflect the idea that physics is more than a framework of theories and laws – it’s also a scientific culture with a set of standards, values, and goals that both explicitly and implicitly guide intuition, and these are equally important when adopting aspects of a scientific practice. Importantly, physics does not strive purely for mathematical rigor, nor does it use random intuition. Physicists are comfortable with incompleteness and approximation, including some hand waving and order of magnitude thinking. They also have an intuition for when to take orders of scale seriously, and when three can be taken as ‘large’.
In general, I think the physics way of thinking strikes the right balance between rigor and estimation for AI research, and may be a first approximation to an AI safety way of thinking that can be refined as AI safety’s scientific foundations are built. However, we need to be careful using a single discipline as a razor for what makes ‘good science’ in a field without a pre-existing research paradigm. There is a danger in over-indexing on any particular field’s point of view too early, and I tend to think that an open-minded, multi-faceted (maybe even anarchistic) approach may be best to ensure AI safety gets it right the first time. Maybe physics’ tricks, intuitions, or idiosyncrasies only apply in physics because we already have physics to support them. Would they work with AI unless there are better scientific foundations?[3] While I am excited by a broader application of physics to AI systems in order to more comprehensively understand them, I think we should be clear that at this point we are merely doing what we can, and make a concerted effort in our research to note what we do, what others do, and what the science as a whole is doing.
Teaser: A QFT of AI
I am particularly excited about the application of quantum field theory (QFT) techniques to AI systems. I am currently working on a series of posts about these ideas with Dmitry Vaintrob (Dmitry’s background posts can be found here). We are working on a follow-up post about why we think this could be important for building a more comprehensive understanding and scientific foundations for AI systems.
- ^
The Keplerian model is also favorable according to Occam’s razor. I think this is a good example that get’s at the point that we don’t have a way to understand the degeneracy of latent models in ML, since these are all equal proxies for the data. However, we should not think too hard about the implications of shifting paradigms on this analogy.
- ^
‘Scale’ is a term that requires some unpacking, which I aim to do in a future post. For now, I’ll define it as a dimensionless parameter that sets the level of granularity in a system with interacting degrees of freedom.
- ^
Moreover, intuition for what is ‘good’ or ‘ok’ can vary between scientific subcultures within physics. Historically, these have valued different levels of rigor or connection with empiricism.
I would love an excuse to go back and learn QFT. Looking forward to your QFT AI insights :D
An interesting analogy, closer to ML, would be to look at neuroscience. It’s an older field than ML, and it seems that the physics perspective has been fairly productive, even though not successful at providing a grand unified theory of cognition yet. Some examples:
Using methods from electric circuits to explain neurons (Hodgkin-Huxley model, cable theory)
Dynamical systems to explain phenomena like synchronization in neuronal oscillations (ex: Kuramoto model)
Ising models to model some collective behaviour of neurons
Information theory is commonly used in neuroscience to analyze neural data and model the brain (ex: efficient coding hypothesis)
Attempts at general theories of cognition like predictive processing, or the free energy principle which also have a strong physics inspiration (drawing from statistical physics, and the least action principle)
I can recommend the book Models of the Mind, from Grace Lindsay, which gives an overview of the many way physics contributed to neuroscience.
In principle, one might think that it would be easier to make progress using a physics perspective on AI than in neuroscience, for example because it is easier to do experiments in AI (in neuroscience we do not have access to the value of the weights, we do not always have access to all the neurons, and often it is not possible to intervene on the system).
For example: https://www.lesswrong.com/posts/EhTMM77iKBTBxBKRe/the-laws-of-large-numbers