This argument is empirical, while the orthogonality hypothesis is merely philosophical, which means this is a stronger argument imo.
But this argument does not imply alignment is easy. It implies that acute goals are easy, while orthogonal goals are hard. Therefore, a player of games agent will be easy to align with power-seeking, but hard to align with banter. A chat agent will be easy to align with banter and hard to align with power-seeking.
We are currently in the chat phase, which seems to imply easier alignment to chatty huggy human values, but we might soon enter the player of long-term games phase. So this argument implies that alignment is currently easier, and if we enter the era of RL long-term planning agents, it will get harder.
This argument is empirical, while the orthogonality hypothesis is merely philosophical, which means this is a stronger argument imo.
But this argument does not imply alignment is easy. It implies that acute goals are easy, while orthogonal goals are hard. Therefore, a player of games agent will be easy to align with power-seeking, but hard to align with banter. A chat agent will be easy to align with banter and hard to align with power-seeking.
We are currently in the chat phase, which seems to imply easier alignment to chatty huggy human values, but we might soon enter the player of long-term games phase. So this argument implies that alignment is currently easier, and if we enter the era of RL long-term planning agents, it will get harder.