I definitely support having models that engage more with the messiness of the real world. I’m not sure if I would have used “wide models”—it seems like even the assumption of a crisp core makes it not as capable of handling messiness as I want. But if you’re trying to get formal guarantees and you need to use some model, a wide model seems probably useful to use.
This also seems useful for helping explain why I think alignment is likely difficult. If intelligence is shaped like an onion, then perhaps you can have an aligned seed that then grows out to be wide and have many layers. But if intelligence is instead shaped like a pyramid, then we can’t place a capstone of alignment and build downwards; the correctness of the more abstract levels depends on the foundations on which those rest.
This doesn’t (yet) seem like an argument that alignment is likely difficult. Why should intelligence be shaped like a pyramid? Even if it is, how does alignment depend on the shape of intelligence? Intuitively, if intelligence is shaped like a pyramid, then it’s just really hard to get intelligence, and so we don’t build a superintelligent AI.
This doesn’t (yet) seem like an argument that alignment is likely difficult. Why should intelligence be shaped like a pyramid? Even if it is, how does alignment depend on the shape of intelligence? Intuitively, if intelligence is shaped like a pyramid, then it’s just really hard to get intelligence, and so we don’t build a superintelligent AI.
Agreed that the rest of the argument is undeveloped in the OP.
First is the argument that animal intelligence is approximately pyramidal in its construction, with neurons serving roles at varying levels of abstraction, and (importantly) layers that are higher up being expressed in terms of neurons at lower layers, in basically the way that neurons in a neural network work.
Alignment can (sort of) be viewed as a correspondence between intelligences. One might analogize this to comparing two programs and trying to figure out if they behave similarly. If the programs are neural networks, we can’t just look at the last layer and see if the parameter weights line up; we have to look at all the parameters, and do some complicated math to see if they happen to be instantiating the same (or sufficiently similar) functions in different ways. For other types of programs, checking that they’re the same is much easier; for example, consider the problem of showing that two formulations of a linear programming problem are equivalent.
I think “really hard” is an overstatement here. It looks like evolution built lizards then mammals then humans by gradually adding on layers, and it seems similarly possible that we could build a very intelligent system out of hooking together lots of subsystems that perform their roles ‘well enough’ but without the sort of meta-level systems that ensure the whole system does what we want it to do. Often, people have an intuition that either the system will fail to do anything at all, or it will do basically what we want, which I think is not true.
I’m not sure that this implies that alignment is hard—if you’re trying to prove that your system is aligned by looking at the details of how it is constructed and showing that it all works together, then yes, alignment is harder than it would be otherwise. But you could imagine other versions of alignment, eg. taking intelligence as a black box and pointing it in the right direction. (For example, if I magically knew the true human utility function, and I put that in the black box, the outcomes would probably be good.)
Here when I say “aligned” I mean “trying to help”. It’s still possible that the AI is incompetent and fails because it doesn’t understand what the consequences of its actions are.
I definitely support having models that engage more with the messiness of the real world. I’m not sure if I would have used “wide models”—it seems like even the assumption of a crisp core makes it not as capable of handling messiness as I want. But if you’re trying to get formal guarantees and you need to use some model, a wide model seems probably useful to use.
This doesn’t (yet) seem like an argument that alignment is likely difficult. Why should intelligence be shaped like a pyramid? Even if it is, how does alignment depend on the shape of intelligence? Intuitively, if intelligence is shaped like a pyramid, then it’s just really hard to get intelligence, and so we don’t build a superintelligent AI.
Agreed that the rest of the argument is undeveloped in the OP.
First is the argument that animal intelligence is approximately pyramidal in its construction, with neurons serving roles at varying levels of abstraction, and (importantly) layers that are higher up being expressed in terms of neurons at lower layers, in basically the way that neurons in a neural network work.
Alignment can (sort of) be viewed as a correspondence between intelligences. One might analogize this to comparing two programs and trying to figure out if they behave similarly. If the programs are neural networks, we can’t just look at the last layer and see if the parameter weights line up; we have to look at all the parameters, and do some complicated math to see if they happen to be instantiating the same (or sufficiently similar) functions in different ways. For other types of programs, checking that they’re the same is much easier; for example, consider the problem of showing that two formulations of a linear programming problem are equivalent.
I think “really hard” is an overstatement here. It looks like evolution built lizards then mammals then humans by gradually adding on layers, and it seems similarly possible that we could build a very intelligent system out of hooking together lots of subsystems that perform their roles ‘well enough’ but without the sort of meta-level systems that ensure the whole system does what we want it to do. Often, people have an intuition that either the system will fail to do anything at all, or it will do basically what we want, which I think is not true.
Cool, I think I mostly agree with you.
I’m not sure that this implies that alignment is hard—if you’re trying to prove that your system is aligned by looking at the details of how it is constructed and showing that it all works together, then yes, alignment is harder than it would be otherwise. But you could imagine other versions of alignment, eg. taking intelligence as a black box and pointing it in the right direction. (For example, if I magically knew the true human utility function, and I put that in the black box, the outcomes would probably be good.)
Here when I say “aligned” I mean “trying to help”. It’s still possible that the AI is incompetent and fails because it doesn’t understand what the consequences of its actions are.