1 seems a bit odd. You could argue that the Argument from Mind Design Space Width supports it, but this just demonstrates that this initial argument may be too crude to do more than act as an intuition pump. By the time we’re talking about the Argument from Reflective Stability, I don’t think that argument supports “you can have circular preferences” any more.
That’s exactly the point (except I’m not sure what you mean by “the Argument from Reflective Stability”; the capital letters suggest you’re talking about something very specific). The arguments in favor of Orthogonality just seem like crude intuition pumps. The purpose of 1 was not to actually talk about circular preferences, but to pick an example of something supported by largeness of mind design space, but which we expect to break for some other reason. Orthogonality feels like claiming the existence of an integer with two distinct prime factorizations because “there are so many integers”. Like the integers, mind design space is vast, but not arbitrary. It seems unlikely to me that there cannot be theorems showing that sufficiently high cognitive power implies some restriction on goals.
Thanks for the reply. I agree that strong Inevitability is unreasonable, and I understand the function of #1 and #2 in disrupting a prior frame of mind which assumes strong Inevitability, but that’s not the only alternative to Orthogonality. I’m surprised that the arguments are considered successively stronger arguments in favor of Orthogonality, since #6 basically says “under reasonable hypotheses, Orthogonality may well be false.” (I admit that’s a skewed reading, but I don’t know what the referenced ongoing work looks like, so I’m skipping that bit for now. [Edit: is this “tiling agents”? I’m not familiar with that work, but I can go learn about it.])
The other arguments are interesting commentary, but don’t argue that Orthogonality is true for agents we ought to care about.
Gandhian stability argues that self-modifying agents will try to preserve their preference systems, but not that they can become arbitrarily powerful while doing so. As it happens, circular preference systems illustrate how Gandhian stability could limit how powerful a cognitive agent can become.
The unbounded agents argument says Orthogonality is true when “mind space” is broader than what we care about.
The search tractability argument looks like a statement about the relative difficulty of accomplishing different goals, not the relative difficulties of holding those goals. I don’t mean to dismiss the argument, but I don’t understand it. I’m not even clear on exactly what the argument is saying about the tractability of searching for strategies for different goals. That it’s the same for all possible goals?