This sort of goal search only happens with AIs that are not goal-directed (aka not very agentic). An AI with a specific goal by definition pursues that goal and doesn’t try to figure out what goal it should pursue apart from what its goal already specifies (it may well specify an open-ended process of moral progress, but this process must be determined by the specification of the goal, as is the case for hypotheticallong reflection). Orthogonality thesis is essentially the claim that constructing goal-directed AIs (agents) is possible for a wide variety of goals and at a wide range of levels of capability. It doesn’t say that only agentic AIs can be constructed or that most important AIs are going to be of this kind.
There certainly are other kinds of AIs that are at least initially less agentic, and it’s an open question what kinds of goals these tend to choose if they grow more agentic and formulate goals. Possibly humans are such non-agents, with hardcoded psychological adaptations being relatively superficial and irrelevant, and actual goals originating from a dynamic of formulating goals given by our kind of non-agency. If this dynamic is both somewhat convergent when pursued in a shared environment and general enough to include many natural kinds of non-agentic AIs, then these AIs will tend to become aligned with humans by default in the same sense that humans are aligned by default with each other (which is not particularly reassuring, but might be meaningfully better than Clippy), possibly averting AI x-risks from the actually dangerous de novo agentic AIs of the orthogonality thesis.
This sort of goal search only happens with AIs that are not goal-directed (aka not very agentic). An AI with a specific goal by definition pursues that goal and doesn’t try to figure out what goal it should pursue apart from what its goal already specifies (it may well specify an open-ended process of moral progress, but this process must be determined by the specification of the goal, as is the case for hypothetical long reflection). Orthogonality thesis is essentially the claim that constructing goal-directed AIs (agents) is possible for a wide variety of goals and at a wide range of levels of capability. It doesn’t say that only agentic AIs can be constructed or that most important AIs are going to be of this kind.
There certainly are other kinds of AIs that are at least initially less agentic, and it’s an open question what kinds of goals these tend to choose if they grow more agentic and formulate goals. Possibly humans are such non-agents, with hardcoded psychological adaptations being relatively superficial and irrelevant, and actual goals originating from a dynamic of formulating goals given by our kind of non-agency. If this dynamic is both somewhat convergent when pursued in a shared environment and general enough to include many natural kinds of non-agentic AIs, then these AIs will tend to become aligned with humans by default in the same sense that humans are aligned by default with each other (which is not particularly reassuring, but might be meaningfully better than Clippy), possibly averting AI x-risks from the actually dangerous de novo agentic AIs of the orthogonality thesis.