Vaniver comments on Is the argument that AI is an xrisk valid?

Vaniver 19 Jul 2021 19:33 UTC
5 points
The orthogonality thesis is thus much stronger than the denial of a (presumed) Kantian thesis that more intelligent beings would automatically be more ethical, or that an omniscient agent would maximise expected utility on anything, including selecting the best goals: It denies any relation between intelligence and the ability to reflect on goals.
I don’t think this is true, and have two different main lines of argument / intuition pumps. I’ll save the other for a later section where it fits better.
Are there several different reflectively stable moral equilibria, or only one? For example, it might be possible to have a consistent philosophically stable egoistic worldview, and also possible to have a consistent philosophically stable altruistic worldview. In this lens, the orthogonality thesis is the claim that there are at least two such stable equilibria and which equilibrium you end up in isn’t related to intelligence. [Some people might be egoists because they don’t realize that other people have inner lives, and increased intelligence unlocks their latent altruism, but some people might just not care about other people in a way that makes them egoists, and making them ‘smarter’ doesn’t have to touch that.]
For example, you might imagine an American nationalist and a Chinese nationalist, both remaining nationalistic as they become more intelligent, and never switching which nation they like more, because that choice was for historical reasons instead of logical ones. If you imagine that, no, at some intelligence threshold they have to discard their nationalism, then you need to make that case in opposition to the orthogonality thesis.
For some goals, I do think it’s the case that at some intelligence threshold you have to discard it, hence the ‘more or less’, and I think many more ‘goals’ are unstable, where the more you think about them, the more they dissolve and are replaced by one of the stable attractors. For example, you might imagine it’s the case that you can have reflectively stable nationalists who eat meat and universalists who are vegan, but any universalists who eat meat are not reflectively stable, where either they realize their arguments for eating meat imply nationalism or their arguments against nationalism imply not eating meat. [Or maybe the middle position is reflectively stable, idk.]
In this view, the existential risk argument is less “humans will be killed by robots and that’s sad” and more “our choice of superintelligence to build will decide what color the lightcone explosion is and some of those possibilities are as bad or worse than all humans dying, and differences between colors might be colossally important.” [For example, some philosophers today think that uploading human brains to silicon substrates will murder them / eliminate their moral value; it seems important for the system colonizing the galaxies to get that right! Some philosophers think that factory farming is immensely bad, and getting questions like that right before you hit copy-paste billions of times seems important.]