Alas, in the real world I suspect we would have to accept a system that would only kill us in its omnipotent limit; that is, if neural models are a path to AGI, we are not going to have lots of formal guarantees about how a model’s utility is shaped, but we are going to have a lot of control over how the model’s computation is shaped. I don’t agree the difference here is just one of model scale, as most of the properties listed are qualitative differences, not quantitative, and backpropagation bakes these biases directly into the model, meaningfully shaping the kind of reasoning it can do.
My interlude was aimed at this sort of response, because it defocuses the map if you aren’t able to point at what your models of the world actually say about it. I was never advocating that this model was safe in reality (I hope the tone made that clear within the first few sentences), so I’m not concerned if the argument is a Bad Thing, just that it is a useful test dummy for people to start saying (or at least thinking) concrete things about.
I was under the impression that the Yudkowsky view is that “optimality” and “agency” are the same thing. “Agency” is just coherent optimization.
What I expect most people mean by optimality is the degree to which something approaches a best answer. A nuclear weapon has a lot of optimality in it, given its domain. It isn’t an agent. I don’t think optimality and coherent optimisation can be the same thing, because lots of optimal things, like best fit lines on charts, do not do optimisation, they just are.
I expect Yudkowsky’s position to look more like, well, this
the reason why I don’t expect the GPT-5s to be competitive with Living Zero is that gradient descent on feedforward transformer layers, in order how to learn science by competing to generate text that humans like, would have to pick up on some very deep latent patterns generating that text, and I don’t think there’s an incremental pathway there for gradient descent to follow—if gradient descent even follows incremental pathways as opposed to finding lottery tickets, but that’s a whole separate open question of artificial neuroscience.
in other words, humans play around with legos, and hominids play around with chipping flint handaxes, and mammals play around with spatial reasoning, and that’s part of the incremental pathway to developing deep patterns for causal investigation and engineering, which then get projected into human text and picked up by humans reading text
it’s just straightforwardly not clear to me that GPT-5 pretrained on human text corpuses, and then further posttrained by RL on human judgment of text outputs, ever runs across the deep patterns
in that he is distinguishing quite strongly between something optimised-to-be-good-at and something actually-doing-the-optimising. My example was chosen in large part to rule out this coherent internal optimisation loop, and have its behavior describable with only short forward inference steps a GPT-5 model might conceivably be able to do, explicitly excluding the qualitative changes he suspects it would struggle to learn. But I don’t want to put more words in his mouth than that.
Alas, in the real world I suspect we would have to accept a system that would only kill us in its omnipotent limit; that is, if neural models are a path to AGI, we are not going to have lots of formal guarantees about how a model’s utility is shaped, but we are going to have a lot of control over how the model’s computation is shaped. I don’t agree the difference here is just one of model scale, as most of the properties listed are qualitative differences, not quantitative, and backpropagation bakes these biases directly into the model, meaningfully shaping the kind of reasoning it can do.
My interlude was aimed at this sort of response, because it defocuses the map if you aren’t able to point at what your models of the world actually say about it. I was never advocating that this model was safe in reality (I hope the tone made that clear within the first few sentences), so I’m not concerned if the argument is a Bad Thing, just that it is a useful test dummy for people to start saying (or at least thinking) concrete things about.
What I expect most people mean by optimality is the degree to which something approaches a best answer. A nuclear weapon has a lot of optimality in it, given its domain. It isn’t an agent. I don’t think optimality and coherent optimisation can be the same thing, because lots of optimal things, like best fit lines on charts, do not do optimisation, they just are.
I expect Yudkowsky’s position to look more like, well, this
in that he is distinguishing quite strongly between something optimised-to-be-good-at and something actually-doing-the-optimising. My example was chosen in large part to rule out this coherent internal optimisation loop, and have its behavior describable with only short forward inference steps a GPT-5 model might conceivably be able to do, explicitly excluding the qualitative changes he suspects it would struggle to learn. But I don’t want to put more words in his mouth than that.