ryan_greenblatt comments on Deconstructing Bostrom’s Classic Argument for AI Doom

ryan_greenblatt 11 Mar 2024 8:04 UTC
5 points
2
I mean, the most straightforward reading of Chapters 7 and 8 of Superintelligence is just a possibility-therefore-probability fallacy in my opinion.

The most relevant quote from Superintelligence (that I could find) is:

Second, the orthogonality thesis suggests that we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans—scientific curiosity, benevolent concern for others, spiritual enlightenment and contemplation, renunciation of material acquisitiveness, a taste for refined culture or for the simple pleasures in life, humility and selflessness, and so forth. We will consider later whether it might be possible through deliberate effort to construct a superintelligence that values such things, or to build one that values human welfare, moral goodness, or any other complex purpose its designers might want it to serve. But it is no less possible— and in fact technically a lot easier—to build a superintelligence that places final value on nothing but calculating the decimal expansion of pi. This suggests that—absent a special effort—the first superintelligence may have some such random or reductionistic final goal.

My interpretation is that Bostrom is trying to be reasonably precise here and trying to do something like:
1. You might have “blithely assumed” that things would necessarily be fine, but orthogonality. (Again, extremely obvious.)
2. Also, it (separately) seems to me (Bostrom) to be technically easier to get your AI to have a simple goal, which implies that random goals might be more likely.
I think you disagree with point (2) here (and I disagree with point 2 as well), but this seems different from the claim you made. (I didn’t bother looking for Bostrom’s arguments for (2), but I expect them to be weak and easily defeated, at least ex-post.)

TBC, I can see where you’re coming from, but I think Bostrom tries to avoid this fallacy. It would be considerably better if he explicitly called out this fallacy and disclaimed it. So, I think he should be partially blamed for likely misinterpretations.