Matthew Barnett comments on An Analytic Perspective on AI Alignment

Matthew Barnett 2 Mar 2020 20:37 UTC
LW: 2 AF: 1
AF
I think it’s plausible that there will be a simple basin that we can regularise an AGI into, because I have some ideas about how to do it, and because the world hasn’t thought very hard about the problem yet (meaning the lack of extant solutions is to some extent explained away).
That makes sense. More pessimistically, one could imagine that the reason why no one has thought very hard about it is because in practice, it doesn’t really help you that much to have a mechanistic understanding of a neural network in order to do useful work. Though perhaps as AI becomes more ‘agentic’ you think that will cease to be the case?
I also think that there exists a relatively simple mathematical backbone to intelligence to be found (but not that all intelligent systems have this backbone), because I think promising progress has been made in mathematising a bunch of relevant concepts (see probability theory, utility theory, AIXI, reflective oracles). But this might be a bias from ‘growing up’ academically in Marcus Hutter’s lab.
I had read your comment thread on Realism about Rationality a while back, and I was under the impression that your stance was something like “rationality is as real as liberalism” or something like that. A relatively simple backbone in the same ballpark as probability theory, utility theory etc. seems way more realist than that.
I also have an intuition for why focusing on these mathematical theories might bias us towards thinking that intelligence can be described mathematically, but it’s a difficult intuition to convey, so bear with me.
First, an observation: the reason why the simple theories of intelligence don’t produce intelligence in practice is because direct computations for them are extremely expensive. There are ways to reduce the compute draw for them to work, but the “things you do to increase compute efficiency of intelligence” is arguably the hardest part about building intelligent machines, and the part that makes up the majority of conceptual space for understanding them. Therefore, understanding real-world intelligent machines requires mostly understanding the tricks they do to be compute-efficient, rather than understanding the mathematical underpinnings.
This intuition is a bit vague, but maybe you saw what I was trying to say?
That being said, I have the feeling that this answer isn’t satisfactorily detailed, so maybe you want more detail, or are thinking of a critique I haven’t thought of?
I care primarily about AI deception at the moment, and I suspect the biggest reason an AI would deceive us is because it received an input that was off-distribution that caused it to act weird. Input-specific interpretability allows us to detect those cases when they arise. Mechanistic transparency might help, but only if the mathematical description of the AI is amenable to real-world analysis.
Most likely, a mathematical description will be long and complex, and the developers will have to pay a high cost to understand how the description could imply deception (But given what you said above about a simple basin, I think this is probably a crux).
- DanielFilan 2 Mar 2020 21:06 UTC
  LW: 5 AF: 2
  AF Parent
  I’ll just respond to the easy part of this for now.
  
  I had read your comment thread on Realism about Rationality a while back, and I was under the impression that your stance was something like “rationality is as real as liberalism” or something like that. A relatively simple backbone in the same ballpark as probability theory, utility theory etc. seems way more realist than that.
  
  That’s not what I said. Because it takes ages to scroll down to comments and I’m on my phone, I can’t easily link to the relevant comments, but basically I said that rationality is probably as formalisable as electromagnetism, but that theories as precise as that of liberalism can still be reasoned about and built on.
  - Matthew Barnett 2 Mar 2020 21:13 UTC
    LW: 2 AF: 1
    AF Parent
    That’s not what I said.
    That’s fair. I didn’t actually quite understand what your position was and was trying to clarify.
- DanielFilan 10 Mar 2020 18:08 UTC
  LW: 3 AF: 2
  AF Parent
  FWIW I take this work on ‘circuits’ in an image recognition CNN to be a bullish indicator for the possibility of mechanistic transparency.
- DanielFilan 5 Mar 2020 20:34 UTC
  LW: 2 AF: 1
  AF Parent
  
  More pessimistically, one could imagine that the reason why no one has thought very hard about it is because in practice, it doesn’t really help you that much to have a mechanistic understanding of a neural network in order to do useful work.
  
  I think I just think the ‘market’ here is ‘inefficient’? Like, I think this just isn’t a thing that people have really thought of, and those that have have gained semi-useful insight into neural networks by doing similar things (e.g. figuring out that adding a picture of a baseball to a whale fin will cause a network to misclassify the image as a great white shark). It also seems to me that recognition tasks (as opposed to planning/reasoning tasks) are going to be the hardest to get this kind of mechanistic transparency for, and also the kinds of tasks where transparency is easiest and ML systems are best.
  
  Therefore, understanding real-world intelligent machines requires mostly understanding the tricks they do to be compute-efficient, rather than understanding the mathematical underpinnings.
  
  I think I understand what you mean here, but also think that there can be tricks that reduce computational cost that have some sort of mathematical backbone—it seems to me that this is common in the study of algorithms. Note also that we don’t have to understand all possible real-world intelligent machines, just the ones that we build, making the requirement less stringent.