The orthogonality thesis tells you that that thing where you attempt to do a logical proof that it is a good idea to make an AI is doomed to fail because whether it’s a good idea depends on the goals you give it.
It depends on whether it will be safe, in general. Not having goals,, or having corrigible goals are forms of safety—so it doesn’t all depend on the goals you initially give it.
t’s important for directing the conversation towards productive topics like “what do we want the AI to do, and how will it’s goals become to do that?”
As widely (mis)understood, it smuggles in the idea that AIs are necessarily goal driven, the goals are necessarily stable and incorrigible, etc. It’s not widely recognised that there is a wider orthogonality thesis, that Mindpsace also contains many combinations of capability and goal instability/stability. That weakens the MIRI/Yudkoiwsky argument that goal alignment has to be got right first time.
>’m not convinced constitutional AI scales to superintelligences.
You can obviously create AIs that don’t try to achieve anything in the world, and sometimes they are useful for various reasons, but some people who are trying to achieve things in the world find it to be a good idea to make AIs that also try to achieve things in the world, and the existence of these people is sufficient to create existential risk.
It depends on whether it will be safe, in general. Not having goals,, or having corrigible goals are forms of safety—so it doesn’t all depend on the goals you initially give it.
As widely (mis)understood, it smuggles in the idea that AIs are necessarily goal driven, the goals are necessarily stable and incorrigible, etc. It’s not widely recognised that there is a wider orthogonality thesis, that Mindpsace also contains many combinations of capability and goal instability/stability. That weakens the MIRI/Yudkoiwsky argument that goal alignment has to be got right first time.
>’m not convinced constitutional AI scales to superintelligences.
I’m not convinced that ASI will happen overnight.
You can obviously create AIs that don’t try to achieve anything in the world, and sometimes they are useful for various reasons, but some people who are trying to achieve things in the world find it to be a good idea to make AIs that also try to achieve things in the world, and the existence of these people is sufficient to create existential risk.
But it’s not the OT telling you that.
The orthogonality thesis is indeed not sufficient to derive everything of AI safety, but that doesn’t mean it’s trivial.