Roman Leventov comments on On Developing a Mathematical Theory of Interpretability

Roman Leventov 9 Feb 2023 15:39 UTC
5 points
−6
A more important difference—and a more genuine criticism of this analogy—is that mathematical physics is of course applied to the real, natural world. And perhaps there really is something about nature that makes it fundamentally amenable to mathematical description in a way that just won’t apply to a large neural network trained by some sort of gradient descent? Indeed one does have the feeling that the endeavour we are focussing on would have to be underpinned by a hope that there is something sufficiently ‘natural’ about deep learning systems that will ultimately make at least some higher-level aspects of them amenable to mathematical analysis. Right now I cannot say how big of a problem this is.
There is no difference between natural phenomena and DNNs (LLMs, whatever). DNNs are 100% natural, don’t you seriously believe there is something supernatural in their working? Hence, the criticism is invalid and the problem is non-existent.
See “AI as physical systems” for more on this. And in the same vein: “DNNs, GPUs, and their technoevolutionary lineages are agents”.
I think that a lot of AI safety and AGI capability researchers are confused about this. They see information and computing as mathematical rather than physical. The physicalism of information and computation is a very important ontological commitment one has to make to deconfuse oneself about AI safety. If you wish to “take this pill”, see Fields et al. (2022a), section 2.
I think the above confusion of the study of AI as mathematics (rather than physics and cognitive science—natural sciences) leads you and some other people that new mathematics has to be developed to understand AI (I may misinterpret you, but it definitely seems from the post that you think this is true, e. g. from your example of algebraic topology). It might be that we will need new mathematics, but it’s far from certain. As an example, take “A mathematical framework for transformer circuits”: it doesn’t develop new mathematics. It just uses existing mathematics: tensor algebra.
I think the research agenda of “weaving together theories of cognition and cognitive development, ML, deep learning, and interpretability through the abstraction-grounding stack” could plausibly lead to the sufficiently robust and versatile understanding of DNNs that we want^[1], without the need to develop much or any new mathematics along the way.
Here’s what the “abstraction-grounding stack” looks like:
Many of the connections between theories of cognition and cognitive development at different levels of specificity are not established yet, and therefore present a lot of opportunities to verify the specific mechanistic interpretability theories:
- I haven’t heard of any research attempting to connect FEP/Active Inference (Fields et al. 2022a; Friston et al. 2022) with many theories of DNNs and deep learning, such as Balestriero (2018), Balestriero et al. (2019), or Roberts, Yaida & Hanin (2021). Note that Marciano et al. (2022) make such a connection between their own theory of DNNs and Active Inference. Apart from Active Inference, some other theories of intelligence and cognition (Boyd et al. 2022; Ma et al. 2022) are “ML-first” and thus cover both “general cognition” and “ML” levels of description at once.
- On the next level, interpretability theories (at least those authored by Anthropic) are not yet connected to any general theories of cognition, ML, DNNs, or deep learning. In particular, I think it would be very interesting to connect theories of polysemanticity (Elhage et al. 2022; Scherlis et al. 2022) with general theories of contextual/quantum cognition (see Basieva et al. (2021), Pothos & Busemeyer (2022), and Fields & Glazebrook (2022) for some recent reviews, and Fields et al. (2022a) and Tanaka et al. (2022) for examples of recent work).
- I make a hypothesis about a “skip-level” connection between quantum FEP (Fields et al. 2022a) and the circuits theory (Olah et al. 2020), identifying quantum reference frames with features in DNNs. This connection should be checked and compared with Marciano et al. (2022).
- All the theories of polysemanticity (Elhage et al. 2022; Scherlis et al. 2022) and grokking (Liu et al. 2022; Nanda et al. 2023) make associative connections to phase transitions in physics, but, as far as I can tell, none of these theories has yet been connected with physical theories of dynamical systems, criticality, and emergence more rigorously and attempted to propose some falsifiable predictions about the behaviour of NNs that would follow from the physical theories.
- Fields et al. (2022b) propose topological quantum neural networks (TQNNs) as a general framework of neuromorphic computation, and some theory of their development. Marciano et al. (2022) establish that DNNs are a semi-classical limit of TQNNs. To close the “developmental” arc, we should identify how the general theories of neuromorphic development, evolution, and selection map on the theories of feature and circuit development, evolution, and selection inside DNNs or, specifically, transformers.
- I don’t yet understand, how, but perhaps the connections between ML, fractional dynamics, and renormalisation group, identified by Niu et al. (2021), could help to better understand, verify, and contextualise some mechanistic interpretability theories as well.
1. ^
  I’d restrain from saying “general theory” because general theories of DNNs already exist, and in large numbers. I’m not sure these theories are insufficient for our purposes and new general theories should be developed. However, what indeed is definitely lacking are the connections between the theories throughout the “abstraction-grounding stack”. This is explained in more detail in the description of the agenda. See also the quote from here: “We (general intelligences) use science (or, generally speaking, construct any models of any phenomena) for the pragmatic purpose of being able to understand, predict and control it. Thus, none of the disciplinary perspectives on any phenomena should be regarded as the “primary” or “most correct” one for some metaphysical or ontological reasons. Also, this implies that if we can reach a sufficient level of understanding of some phenomenon (such as AGI) by opportunistically applying several existing theories then we don’t need to devise a new theory dedicated to this phenomenon specifically: we already solved our task without doing this step.”
- carboniferous_umbraculum 9 Feb 2023 19:47 UTC
  9 points
  6
  Parent
  >There is no difference between natural phenomena and DNNs (LLMs, whatever). DNNs are 100% natural
  I mean “natural” as opposed to “man made”. i.e. something like “occurs in nature without being built by something or someone else”. So in that sense, DNNs are obviously not natural in the way that the laws of physics are.
  I don’t see information and computation as only mathematical; in fact in my analogies I write that the mathematical abstractions we build as being separate from the things that one wants to describe or make predictions about. And this applies to the computations in NNs too.
  I don’t want to study AI as mathematics or believe that AI is mathematics. I write that the practice of doing mathematics will only seek out the parts of the problem that are actually amenable to it; and my focus is on interpretability and not other places in AI that one might use mathematics (like, say, decision theory).
  You write “As an example, take “A mathematical framework for transformer circuits”: it doesn’t develop new mathematics. It just uses existing mathematics: tensor algebra.:” I don’t think we are using ‘new mathematics’ in the same way and I don’t think the way you are using it commonplace. Yes I am discussing the prospect of developing new mathematics, but this doesn’t only mean something like ‘making new definitions’ or ‘coming up with new objects that haven’t been studied before’. If I write a proof of a theorem that “just” uses “existing” mathematical objects, say like...matrices, or finite sets, then that seems to have little bearing on how ‘new’ the mathematics is. It may well be a new proof, of a new theorem, containing new ideas etc. etc. And it may well need to have been developed carefully over a long period of time.
  - Roman Leventov 10 Feb 2023 7:24 UTC
    −1 points
    −3
    Parent
    I feel that you are redefining terms. Writing down mathematical equations (or defining other mathematical structures that are not equations, e.g., automata), describing natural phenomena, and proving some properties of these, i.e., deriving some mathematical conjectures/theorems, -- that’s exactly what physicists do, and they call it “doing physics” or “doing science” rather than “doing mathematics”.
    I mean “natural” as opposed to “man made”. i.e. something like “occurs in nature without being built by something or someone else”. So in that sense, DNNs are obviously not natural in the way that the laws of physics are.
    I wonder how would you draw the boundary between “man-made” and “non-man-made”, the boundary that would have a bearing on such a fundamental qualitative distinction of phenomena as the amenability to mathematical description.
    According to Fields et al.’s theory of semantics and observation (“quantum theory […] is increasingly viewed as a theory of the process of observation itself”), which is also consistent with predictive processing and Seth’s controlled hallucination theory which is a descendant of predictive processing, any observer’s phenomenology is what makes mathematical sense by construction. Also, here Wolfram calls approximately the same thing “coherence”.
    Of course, there are infinite phenomena both in “nature” and “among man-made things” the mathematical description of which would not fit our brains yet, but this also means that we cannot spot these phenomena. We can extend the capacity of our brains (e.g., through cyborgisation, or mind upload), as well as equip ourselves with more powerful theories that allow us to compress reality more efficiently and thus spot patterns that were not spottable before, but this automatically means that these patterns become mathematically describable.
    This, of course, implies that we ought to make our minds stronger (through technical means or developing science) precisely to timely spot the phenomena that are about to “catch us”. This is the central point of Deutsch’s “The Beginning of Infinity”.
    Anyway, there is no point in arguing this point fiercely because I’m kind of on “your side” here, arguing that your worry that developing theories of DL might be doomed is unjustified. I’d just call these theories scientific rather than mathematical :)