My objection is mostly fleshed out in my other comment. I’d just flag here that “In other words, you have to do things the “hard way”—no shortcuts” assigns the burden of proof in a way which I think is not usually helpful. You shouldn’t believe my argument that I have a deep theory linking AGI and evolution unless I can explain some really compelling aspects of that theory. Because otherwise you’ll also believe in the deep theory linking AGI and capitalism, and the one linking AGI and symbolic logic, and the one linking intelligence and ethics, and the one linking recursive self-improvement with cultural evolution, etc etc etc.
Now, I’m happy to agree that all of the links I just mentioned are useful lenses which help you understand AGI. But for utility theory to do the type of work Eliezer tries to make it do, it can’t just be a useful lens—it has to be something much more fundamental. And that’s what I don’t think Eliezer’s established.
It also isn’t clear to me that Eliezer has established the strong inferences he draws from noticing this general pattern (“expected utility theory/consequentialism”). But when you asked Eliezer (in the original dialogue) to give examples of successful predictions, I was thinking “No, that’s not how these things work.” In the mistaken applications of Grand Theories you mention (AGI and capitalism, AGI and symbolic logic, intelligence and ethics, recursive self-improvement and cultural evolution, etc.), the easiest way to point out why they are dumb is with counterexamples. We can quickly “see” the counterexamples. E.g., if you’re trying to see AGI as the next step in capitalism, you’ll be able to find counterexamples where things become altogether different (misaligned AI killing everything; singleton that brings an end to the need to compete). By contrast, if the theory fits, you’ll find that whenever you try to construct such a counterexample, it is just a non-central (but still valid) manifestation of the theory. Eliezer would probably say that people who are good at this sort of thinking will quickly see how the skeptics’ counterexamples fall relevantly short.
---
The reason I remain a bit skeptical about Eliezer’s general picture: I’m not sure if his thinking about AGI makes implicit questionable predictions about humans
I don’t understand his thinking well enough to be confident that it doesn’t
It seems to me that Eliezer_2011 placed weirdly strong emphasis on presenting humans in ways that matched the pattern “(scary) consequentialism always generalizes as you scale capabilities.” I consider some of these claims false or at least would want to make the counterexamples more salient
For instance:
Eliezer seemed to think that “extremely few things are worse than death” is something all philosophically sophisticated humans would agree with
Early writings on CEV seemed to emphasize things like the “psychological unity of humankind” and talk as though humans would mostly have the same motivational drives, also with respect to how it relates to “enjoying being agenty” as opposed to “grudgingly doing agenty things but wishing you could be done with your obligations faster”
In HPMOR all the characters are either not philosophically sophisticated or they were amped up into scary consequentialists plotting all the time
All of the above could be totally innocent matters of wanting to emphasize the thing that other commenters were missing, so they aren’t necessarily indicative of overlooking certain possibilities. Still, the pattern there makes me wonder if maybe Eliezer hasn’t spent a lot of time imagining what sorts of motivations humans can have that make them benign not in terms outcome-related ethics (what they want the world to look like), but relational ethics (who they want to respect or assist, what sort of role model they want to follow). It makes me wonder if it’s really true that when you try to train an AI to be helpful and corrigible, the “consequentialism-wants-to-become-agenty-with-its-own-goals part” will be stronger than the “helping this person feels meaningful” part. (Leading to an agent that’s consequentialist about following proper cognition rather than about other world-outcomes.)
FWIW I think I mostly share Eliezer’s intuitions about the arguments where he makes them; I just feel like I lack the part of his picture that lets him discount the observation that some humans are interpersonally corrigible and not all that focused on other explicit goals, and that maybe this means corrigibility has a crisp/natural shape after all.
the easiest way to point out why they are dumb is with counterexamples. We can quickly “see” the counterexamples. E.g., if you’re trying to see AGI as the next step in capitalism, you’ll be able to find counterexamples where things become altogether different (misaligned AI killing everything; singleton that brings an end to the need to compete).
I’m not sure how this would actually work. The proponent of the AGI-capitalism analogy might say “ah yes, AGI killing everyone is another data point on the trend of capitalism becoming increasingly destructive”. Or they might say (as Marx did) that capitalism contains the seeds of its own destruction. Or they might just deny that AGI will play out the way you claim, because their analogy to capitalism is more persuasive than your analogy to humans (or whatever other reasoning you’re using). How do you then classify this as a counterexample rather than a “non-central (but still valid) manifestation of the theory”?
My broader point is that these types of theories are usually sufficiently flexible that they can “predict” most outcomes, which is why it’s so important to pin them down by forcing them to make advance predictions.
On the rest of your comment, +1. I think that one of the weakest parts of Eliezer’s argument was when he appealed to the difference between von Neumann and the village idiot in trying to explain why the next step above humans will be much more consequentialist than most humans (although unfortunately I failed to pursue this point much in the dialogue).
How do you then classify this as a counterexample rather than a “non-central (but still valid) manifestation of the theory”?
My only reply is “You know it when you see it.” And yeah, a crackpot would reason the same way, but non-modest epistemology says that if it’s obvious to you that you’re not a crackpot then you have to operate on the assumption that you’re not a crackpot. (In the alternative scenario, you won’t have much impact anyway.)
Specifically, the situation I mean is the following:
You have an epistemic track record like Eliezer or someone making lots of highly upvoted posts in our communities.
You find yourself having strong intuitions about how to apply powerful principles like “consequentialism” to new domains, and your intuitions are strong because it feels to you like you have a gears-level understanding that others lack. You trust your intuitions in cases like these.
My recommended policy in cases where this applies is “trust your intuitions and operate on the assumption that you’re not a crackpot.”
Maybe there’s a potential crux here about how much of scientific knowledge is dependent on successful predictions. In my view, the sequences have convincingly argued that locating the hypothesis in the first place is often done in the absence of already successful predictions, which goes to show that there’s a core of “good reasoning” that lets you jump to (tentative) conclusions, or at least good guesses, much faster than if you were to try lots of things at random.
My recommended policy in cases where this applies is “trust your intuitions and operate on the assumption that you’re not a crackpot.”
Oh, certainly Eliezer should trust his intuitions and believe that he’s not a crackpot. But I’m not arguing about what the person with the theory should believe, I’m arguing about what outside observers should believe, if they don’t have enough time to fully download and evaluate the relevant intuitions. Asking the person with the theory to give evidence that their intuitions track reality isn’t modest epistemology.
My objection is mostly fleshed out in my other comment. I’d just flag here that “In other words, you have to do things the “hard way”—no shortcuts” assigns the burden of proof in a way which I think is not usually helpful. You shouldn’t believe my argument that I have a deep theory linking AGI and evolution unless I can explain some really compelling aspects of that theory. Because otherwise you’ll also believe in the deep theory linking AGI and capitalism, and the one linking AGI and symbolic logic, and the one linking intelligence and ethics, and the one linking recursive self-improvement with cultural evolution, etc etc etc.
Now, I’m happy to agree that all of the links I just mentioned are useful lenses which help you understand AGI. But for utility theory to do the type of work Eliezer tries to make it do, it can’t just be a useful lens—it has to be something much more fundamental. And that’s what I don’t think Eliezer’s established.
It also isn’t clear to me that Eliezer has established the strong inferences he draws from noticing this general pattern (“expected utility theory/consequentialism”). But when you asked Eliezer (in the original dialogue) to give examples of successful predictions, I was thinking “No, that’s not how these things work.” In the mistaken applications of Grand Theories you mention (AGI and capitalism, AGI and symbolic logic, intelligence and ethics, recursive self-improvement and cultural evolution, etc.), the easiest way to point out why they are dumb is with counterexamples. We can quickly “see” the counterexamples. E.g., if you’re trying to see AGI as the next step in capitalism, you’ll be able to find counterexamples where things become altogether different (misaligned AI killing everything; singleton that brings an end to the need to compete). By contrast, if the theory fits, you’ll find that whenever you try to construct such a counterexample, it is just a non-central (but still valid) manifestation of the theory. Eliezer would probably say that people who are good at this sort of thinking will quickly see how the skeptics’ counterexamples fall relevantly short.
---
The reason I remain a bit skeptical about Eliezer’s general picture: I’m not sure if his thinking about AGI makes implicit questionable predictions about humans
I don’t understand his thinking well enough to be confident that it doesn’t
It seems to me that Eliezer_2011 placed weirdly strong emphasis on presenting humans in ways that matched the pattern “(scary) consequentialism always generalizes as you scale capabilities.” I consider some of these claims false or at least would want to make the counterexamples more salient
For instance:
Eliezer seemed to think that “extremely few things are worse than death” is something all philosophically sophisticated humans would agree with
Early writings on CEV seemed to emphasize things like the “psychological unity of humankind” and talk as though humans would mostly have the same motivational drives, also with respect to how it relates to “enjoying being agenty” as opposed to “grudgingly doing agenty things but wishing you could be done with your obligations faster”
In HPMOR all the characters are either not philosophically sophisticated or they were amped up into scary consequentialists plotting all the time
All of the above could be totally innocent matters of wanting to emphasize the thing that other commenters were missing, so they aren’t necessarily indicative of overlooking certain possibilities. Still, the pattern there makes me wonder if maybe Eliezer hasn’t spent a lot of time imagining what sorts of motivations humans can have that make them benign not in terms outcome-related ethics (what they want the world to look like), but relational ethics (who they want to respect or assist, what sort of role model they want to follow). It makes me wonder if it’s really true that when you try to train an AI to be helpful and corrigible, the “consequentialism-wants-to-become-agenty-with-its-own-goals part” will be stronger than the “helping this person feels meaningful” part. (Leading to an agent that’s consequentialist about following proper cognition rather than about other world-outcomes.)
FWIW I think I mostly share Eliezer’s intuitions about the arguments where he makes them; I just feel like I lack the part of his picture that lets him discount the observation that some humans are interpersonally corrigible and not all that focused on other explicit goals, and that maybe this means corrigibility has a crisp/natural shape after all.
I’m not sure how this would actually work. The proponent of the AGI-capitalism analogy might say “ah yes, AGI killing everyone is another data point on the trend of capitalism becoming increasingly destructive”. Or they might say (as Marx did) that capitalism contains the seeds of its own destruction. Or they might just deny that AGI will play out the way you claim, because their analogy to capitalism is more persuasive than your analogy to humans (or whatever other reasoning you’re using). How do you then classify this as a counterexample rather than a “non-central (but still valid) manifestation of the theory”?
My broader point is that these types of theories are usually sufficiently flexible that they can “predict” most outcomes, which is why it’s so important to pin them down by forcing them to make advance predictions.
On the rest of your comment, +1. I think that one of the weakest parts of Eliezer’s argument was when he appealed to the difference between von Neumann and the village idiot in trying to explain why the next step above humans will be much more consequentialist than most humans (although unfortunately I failed to pursue this point much in the dialogue).
My only reply is “You know it when you see it.” And yeah, a crackpot would reason the same way, but non-modest epistemology says that if it’s obvious to you that you’re not a crackpot then you have to operate on the assumption that you’re not a crackpot. (In the alternative scenario, you won’t have much impact anyway.)
Specifically, the situation I mean is the following:
You have an epistemic track record like Eliezer or someone making lots of highly upvoted posts in our communities.
You find yourself having strong intuitions about how to apply powerful principles like “consequentialism” to new domains, and your intuitions are strong because it feels to you like you have a gears-level understanding that others lack. You trust your intuitions in cases like these.
My recommended policy in cases where this applies is “trust your intuitions and operate on the assumption that you’re not a crackpot.”
Maybe there’s a potential crux here about how much of scientific knowledge is dependent on successful predictions. In my view, the sequences have convincingly argued that locating the hypothesis in the first place is often done in the absence of already successful predictions, which goes to show that there’s a core of “good reasoning” that lets you jump to (tentative) conclusions, or at least good guesses, much faster than if you were to try lots of things at random.
Oh, certainly Eliezer should trust his intuitions and believe that he’s not a crackpot. But I’m not arguing about what the person with the theory should believe, I’m arguing about what outside observers should believe, if they don’t have enough time to fully download and evaluate the relevant intuitions. Asking the person with the theory to give evidence that their intuitions track reality isn’t modest epistemology.