Rohin seems to think the point is “Simply knowing that an agent is intelligent lets us infer that it is goal-directed” but Eliezer doesn’t seem to think that corrigible (hence not goal-directed) agents are impossible to build. (That’s actually one of MIRI’s research objectives even though they take a different approach from Paul’s.)
I think the point (from Eliezer’s perspective) is “Simply knowing that an agent is intelligent lets us infer that it is an expected utility maximizer”. The main implication is that there is no way to affect the details of a superintelligent AI except by affecting its utility function, since everything else is fixed by math (specifically the VNM theorem). Note that this is (or rather, appears to be) a very strong condition on what alignment approaches could possibly work—you can throw out any approach that isn’t going to affect the AI’s utility function. I think this is the primary reason for Eliezer making this argument. Let’s call this the “intelligence implies EU maximization” claim.
Separately, there is another claim that says “EU maximization by default implies goal-directedness” (or the presence of convergent instrumental subgoals, if you prefer that instead of goal-directedness). However, this is not required by math, so it is possible to avoid this implication, by designing your utility function in just the right way.
Corrigibility is possible under this framework by working against the second claim, i.e. designing the utility function in just the right way that you get corrigible behavior out. And in fact this is the approach to corrigibility that MIRI looked into.
I am primarily taking issue with the “intelligence implies EU maximization” argument. The problem is, “intelligence implies EU maximization” is true, it just happens to be vacuous. So I can’t say that that’s what I’m arguing against. This is why I rounded it off to arguing against “intelligence implies goal-directedness”, though this is clearly a bad enough summary that I shouldn’t be saying that any more.
I think the point (from Eliezer’s perspective) is “Simply knowing that an agent is intelligent lets us infer that it is an expected utility maximizer”.
A cognitively powerful agent might not be sufficiently optimized
Scenarios that negate “Relevant powerful agents will be highly optimized”, such as brute forcing non-recursive intelligence, can potentially evade the ‘sufficiently optimized’ condition required to yield predicted coherence. E.g., it might be possible to create a cognitively powerful system by overdriving some fixed set of algorithms, and then to prevent this system from optimizing itself or creating offspring agents in the environment. This could allow the creation of a cognitively powerful system that does not appear to us as a bounded Bayesian. (If, for some reason, that was a good idea.)
In Relevant powerful agents will be highly optimized he went into even more detail about how one might create an intelligent agent that is not “highly optimized” and hence not an expected utility maximizer.
In summary it seems like you misunderstood Eliezer due to not noticing a distinction that he draws between “intelligent” (or “cognitively powerful”) and “highly optimized”.
In summary it seems like you misunderstood Eliezer due to not noticing a distinction that he draws between “intelligent” (or “cognitively powerful”) and “highly optimized”.
That’s true, I’m not sure what this distinction is meant to capture. I’m updating that the thing I said is less likely to be true, but I’m still somewhat confident that it captures the general gist of what Eliezer meant. I would bet on this at even odds if there were some way to evaluate it.
In Relevant powerful agents will be highly optimized he went into even more detail about how one might create an intelligent agent that is not “highly optimized” and hence not an expected utility maximizer.
This is a tiny bit of his writing, and his tone makes it clear that this is unlikely. This is different from what I expected (when something has the force of a theorem you don’t usually call its negation just “unlikely” and have a story for how it could be true), but it still seems consistent with the general story I said above.
In any case, I don’t want to spend any more time figuring out what Eliezer believes, he can say something himself if he wants. I mostly replied to this comment to clarify the particular argument I’m arguing against, which I thought Eliezer believed, but even if he doesn’t it seems like a common implicit belief in the rationalist AI safety crowd and should be debunked anyway.
In any case, I don’t want to spend any more time figuring out what Eliezer believes, he can say something himself if he wants. I mostly replied to this comment to clarify the particular argument I’m arguing against, which I thought Eliezer believed, but even if he doesn’t it seems like a common implicit belief in the rationalist AI safety crowd and should be debunked anyway.
It seems fine to debunk what you think is a common implicit belief in the rationalist AI safety crowd, but I think it’s important to be fair to other researchers and not attribute errors to them when you don’t know or aren’t sure that they actually committed such errors. For people who aren’t domain experts (which is most people), reputation is highly important for them to evaluate claims in a technical field like AI safety, so we should take care not to misinform them about, for example, how often someone makes technical errors.
I’m pretty sure I have never mentioned Eliezer in the Value Learning sequence. I linked to his writings because they’re the best explanation of the perspective I’m arguing against. (Note that this is different from claiming that Eliezer believes that perspective.) This post and comment thread attributed the argument and belief to Eliezer, not me. I responded because it was specifically about what I was arguing against in my post, and I didn’t say “I am clarifying the particular argument I am arguing against and am unsure what Eliezer’s actual position is” because a) I did think that it was Eliezer’s actual position, b) this is a ridiculous amount of boilerplate and c) I try not to spend too much time on comments.
I’m not feeling particularly open to feedback currently, because honestly I think I take far more care about this sort of issue than the typical researcher, but if you want to list a specific thing I could have done differently, I might try to consider how to do that sort of thing in the future.
Just a note that in the link that Wei Dai provides for “Relevant powerful agents will be highly optimized”, Eliezer explicitly assigns ’75%′ to ‘The probability that an agent that is cognitively powerful enough to be relevant to existential outcomes, will have been subject to strong, general optimization pressures.’
even if he doesn’t it seems like a common implicit belief in the rationalist AI safety crowd and should be debunked anyway.
Just a note that in the link that Wei Dai provides for “Relevant powerful agents will be highly optimized”, Eliezer explicitly assigns ’75%′ to ‘The probability that an agent that is cognitively powerful enough to be relevant to existential outcomes, will have been subject to strong, general optimization pressures.’
Yeah, it’s worth noting that I don’t understand what this means. By my intuitive read of the statement, I’d have given it 95+% of being true, in the sense that you aren’t going to randomly stumble upon a powerful agent. But also by my intuitive read, the negative example given on that page would be a positive example:
An example of a scenario that negates RelevantPowerfulAgentsHighlyOptimized is KnownAlgorithmNonrecursiveIntelligence, where a cognitively powerful intelligence is produced by pouring lots of computing power into known algorithms, and this intelligence is then somehow prohibited from self-modification and the creation of environmental subagents.
On my view, known algorithms are already very optimized? E.g. Dijkstra’s algorithm is highly optimized for efficient computation of shortest paths.
So TL;DR idk what optimized is supposed to mean here.
I think the point (from Eliezer’s perspective) is “Simply knowing that an agent is intelligent lets us infer that it is an expected utility maximizer”. The main implication is that there is no way to affect the details of a superintelligent AI except by affecting its utility function, since everything else is fixed by math (specifically the VNM theorem). Note that this is (or rather, appears to be) a very strong condition on what alignment approaches could possibly work—you can throw out any approach that isn’t going to affect the AI’s utility function. I think this is the primary reason for Eliezer making this argument. Let’s call this the “intelligence implies EU maximization” claim.
Separately, there is another claim that says “EU maximization by default implies goal-directedness” (or the presence of convergent instrumental subgoals, if you prefer that instead of goal-directedness). However, this is not required by math, so it is possible to avoid this implication, by designing your utility function in just the right way.
Corrigibility is possible under this framework by working against the second claim, i.e. designing the utility function in just the right way that you get corrigible behavior out. And in fact this is the approach to corrigibility that MIRI looked into.
I am primarily taking issue with the “intelligence implies EU maximization” argument. The problem is, “intelligence implies EU maximization” is true, it just happens to be vacuous. So I can’t say that that’s what I’m arguing against. This is why I rounded it off to arguing against “intelligence implies goal-directedness”, though this is clearly a bad enough summary that I shouldn’t be saying that any more.
Eliezer explicitly disclaimed this:
In Relevant powerful agents will be highly optimized he went into even more detail about how one might create an intelligent agent that is not “highly optimized” and hence not an expected utility maximizer.
In summary it seems like you misunderstood Eliezer due to not noticing a distinction that he draws between “intelligent” (or “cognitively powerful”) and “highly optimized”.
That’s true, I’m not sure what this distinction is meant to capture. I’m updating that the thing I said is less likely to be true, but I’m still somewhat confident that it captures the general gist of what Eliezer meant. I would bet on this at even odds if there were some way to evaluate it.
This is a tiny bit of his writing, and his tone makes it clear that this is unlikely. This is different from what I expected (when something has the force of a theorem you don’t usually call its negation just “unlikely” and have a story for how it could be true), but it still seems consistent with the general story I said above.
In any case, I don’t want to spend any more time figuring out what Eliezer believes, he can say something himself if he wants. I mostly replied to this comment to clarify the particular argument I’m arguing against, which I thought Eliezer believed, but even if he doesn’t it seems like a common implicit belief in the rationalist AI safety crowd and should be debunked anyway.
It seems fine to debunk what you think is a common implicit belief in the rationalist AI safety crowd, but I think it’s important to be fair to other researchers and not attribute errors to them when you don’t know or aren’t sure that they actually committed such errors. For people who aren’t domain experts (which is most people), reputation is highly important for them to evaluate claims in a technical field like AI safety, so we should take care not to misinform them about, for example, how often someone makes technical errors.
I’m pretty sure I have never mentioned Eliezer in the Value Learning sequence. I linked to his writings because they’re the best explanation of the perspective I’m arguing against. (Note that this is different from claiming that Eliezer believes that perspective.) This post and comment thread attributed the argument and belief to Eliezer, not me. I responded because it was specifically about what I was arguing against in my post, and I didn’t say “I am clarifying the particular argument I am arguing against and am unsure what Eliezer’s actual position is” because a) I did think that it was Eliezer’s actual position, b) this is a ridiculous amount of boilerplate and c) I try not to spend too much time on comments.
I’m not feeling particularly open to feedback currently, because honestly I think I take far more care about this sort of issue than the typical researcher, but if you want to list a specific thing I could have done differently, I might try to consider how to do that sort of thing in the future.
Just a note that in the link that Wei Dai provides for “Relevant powerful agents will be highly optimized”, Eliezer explicitly assigns ’75%′ to ‘The probability that an agent that is cognitively powerful enough to be relevant to existential outcomes, will have been subject to strong, general optimization pressures.’
Agreed.
Yeah, it’s worth noting that I don’t understand what this means. By my intuitive read of the statement, I’d have given it 95+% of being true, in the sense that you aren’t going to randomly stumble upon a powerful agent. But also by my intuitive read, the negative example given on that page would be a positive example:
On my view, known algorithms are already very optimized? E.g. Dijkstra’s algorithm is highly optimized for efficient computation of shortest paths.
So TL;DR idk what optimized is supposed to mean here.