Another angle: number of bits of optimization required is a direct measure of “how far out of distribution” we need to generalize.
I think it’s useful to distinguish between the amount of optimization we ask the model to do versus the unlikelihood of the world we ask it to simulate.
For instance, I can condition on something trivial like “the weather was rainy on 8⁄14, sunny on 8⁄15, rainy on 8⁄16...”. This specifies a very unlikely world, but so long as the pattern I specify is plausible it doesn’t require much optimization on the part of the model or take me far out of distribution. There can be many, many plausible patterns like this because the weather is a chaotic system and so intrinsically has a lot of uncertainty, so there’s actually a lot of room to play here.
That’s a silly example, but there are more useful ones. Suppose I condition on a sequence of weather patterns (all locally plausible) that affect voter turnout in key districts such that politicians get elected who favor policies that shift the world towards super-tight regulatory regimes on AI. That let’s me push down the probability that there’s a malicious AI in the simulated world without requiring the model itself to perform crazy amounts of optimization.
Granted, when the model tries to figure out what this world looks like, there’s a danger that it says “Huh, that’s a strange pattern. I wonder if there’s some master-AGI engineering the weather?” and simulates that world. That’s possible, and the whole question is about whether the things you conditioned on pushed down P(bad AGI controls the world) faster than they made the world-writ-large unlikely.
I also think that intelligent agents throw off this kind of analysis. For example, suppose you enter a room and find that I have 10 coins laid out on a table, all heads. Did this happen by chance? It’s unlikely: 1 in 1024. Except obviously I arranged them all to be heads, I didn’t just get lucky. Because I’m intelligent, I can make the probability whatever I want.
Now that I think about it, I think you and I are misinterpreting what johnwentsworth meant when he said “optimization pressure”. I think he just means a measure of how much we have to change the world by, in units of bits, and not any specific piece of information that the AI or alignment researchers produce.
I think it’s useful to distinguish between the amount of optimization we ask the model to do versus the unlikelihood of the world we ask it to simulate.
For instance, I can condition on something trivial like “the weather was rainy on 8⁄14, sunny on 8⁄15, rainy on 8⁄16...”. This specifies a very unlikely world, but so long as the pattern I specify is plausible it doesn’t require much optimization on the part of the model or take me far out of distribution. There can be many, many plausible patterns like this because the weather is a chaotic system and so intrinsically has a lot of uncertainty, so there’s actually a lot of room to play here.
That’s a silly example, but there are more useful ones. Suppose I condition on a sequence of weather patterns (all locally plausible) that affect voter turnout in key districts such that politicians get elected who favor policies that shift the world towards super-tight regulatory regimes on AI. That let’s me push down the probability that there’s a malicious AI in the simulated world without requiring the model itself to perform crazy amounts of optimization.
Granted, when the model tries to figure out what this world looks like, there’s a danger that it says “Huh, that’s a strange pattern. I wonder if there’s some master-AGI engineering the weather?” and simulates that world. That’s possible, and the whole question is about whether the things you conditioned on pushed down P(bad AGI controls the world) faster than they made the world-writ-large unlikely.
I also think that intelligent agents throw off this kind of analysis. For example, suppose you enter a room and find that I have 10 coins laid out on a table, all heads. Did this happen by chance? It’s unlikely: 1 in 1024. Except obviously I arranged them all to be heads, I didn’t just get lucky. Because I’m intelligent, I can make the probability whatever I want.
Now that I think about it, I think you and I are misinterpreting what johnwentsworth meant when he said “optimization pressure”. I think he just means a measure of how much we have to change the world by, in units of bits, and not any specific piece of information that the AI or alignment researchers produce.
I don’t think that’s quite right (or at least, the usage isn’t consistently that). For instance
seems to be very directly about how far out of distribution the generative model is and not about how far our world is from being safe.