Wei Dai’s comment is full of wisdom. In particular:
The Orthogonality Thesis (or it’s denial) must assume that certain types of AI, e.g., those based on generic optimization algorithms that can accept a wide range of objective functions, are feasible (or not) to build, but I don’t think we can safely make such assumptions yet.
But even if that is true, it is nowhere near enough to support an OT that can be plugged into an unfriendliness argument. The Unfriendliness argument requires that it is reasonably likely that researchers could create a paperclipper without meaning to. However, if paperclippers require an architecture—a possible architecture, but only one possible architecture—where goals and their implementation are decoupled, then both requirements are undermined. It is not clear that we can build such machines (“based on generic optimization algorithms that can accept a wide range of objective functions”) , hence a lack of likelihood; and it is also not clear that well intentioned people would.
Unfriendliness of the sort that MIRI worries about could be sidestepped by not adopting the architecture that supports orthogonality, and choosing one of a number of alternatives.
Wei Dai’s comment is full of wisdom. In particular:
But even if that is true, it is nowhere near enough to support an OT that can be plugged into an unfriendliness argument. The Unfriendliness argument requires that it is reasonably likely that researchers could create a paperclipper without meaning to. However, if paperclippers require an architecture—a possible architecture, but only one possible architecture—where goals and their implementation are decoupled, then both requirements are undermined. It is not clear that we can build such machines (“based on generic optimization algorithms that can accept a wide range of objective functions”) , hence a lack of likelihood; and it is also not clear that well intentioned people would.
Unfriendliness of the sort that MIRI worries about could be sidestepped by not adopting the architecture that supports orthogonality, and choosing one of a number of alternatives.