In that sense I think the orthogonality thesis will turn out to be false in practice, even if it is true in theory. It is simply too difficult to program a precise goal into an AI, because in order for that to work the goal has to be worked into every physical detail of the thing. It cannot just be a modular add-on.
I find this plausible but not too likely. There are a few things needed for a universe-optimizing AGI:
really good mathematical function optimization (which you might be able to use to get approximate Solomonoff induction)
a way to specify goals that are still well-defined after an ontological crisis
a solution to the Cartesian boundary problem
I think it is likely that (2) and (3) will eventually be solved (or at least worked around) well enough that you can build universe-optimizing AGIs, partially on the basis that humans approximately solve these somehow and we already have tentative hypotheses about what solutions to these problems might look like. It might be the case that we can’t really get (1), we can only get optimizers that work in some domains but not others. Perhaps universe-optimization (when reduced to a mathematical problem using (2) and (3)) is too difficult of a domain: we need to break the problem down into sub-problems in order to feed it to the optimizer, resulting in a tool-AI like design. But I don’t think this is likely.
If we have powerful tool AIs before we get universe optimizers, this will probably be a temporary stage, because someone will figure out how to use a tool AI to design universe-optimizers someday. But your bet was about the first AGI, so this would still be consistent with you winning your bet.
I’m talking about the fact that humans can (and sometimes do) sort of optimize the universe. Like, you can reason about the way the universe is and decide to work on causing it to be in a certain state.
This could very well be the case, but humans still sometimes sort of optimize the universe. Like, I’m saying it’s at least possible to sort of optimize the universe in theory, and humans do this somewhat, not that humans directly use universe-optimizing to select their actions. If a way to write universe-optimizing AGIs exists, someone is likely to find it eventually.
I agree with this. There are some difficulties with self-modification (as elaborated in my other comment), but it seems probable that this can be done.
Seems pretty plausible. Obviously it depends on what you mean by “AI”; certainly, most modern-day AIs are this way. At the same time, this is definitely not a reason to not worry about AI risk, because (a) tool AIs could still “accidentally” optimize the universe depending on how search for self-modifications and other actions happens, and (b) we can’t bet on no one figuring out how to turn a superintelligent tool AI into a universe optimizer.
I do agree with a lot of what you say: it seems like a lot of people talk about AI risk in terms of universe-optimization, when we don’t even understand how to optimize functions over the universe given infinite computational power. I do think that non-universe-optimizing AIs are under-studied, that they are somewhat likely to be the first human-level AGIs, and that they will be extraordinary useful for solving some FAI-related problems. But none of this makes the problems of AI risk go away.