I think the orthogonality thesis is wrong. For instance, without rejecting the orthogonality thesis, one might think we need to align the AI rather than it automatically finding out the meaning of life using its superior intelligence.
If Nora Belrose wants to make that argument, then she can just do what.
Thanks to Eliezer Y. pushing the orthogonality thesis in rationalist circles, I don’t think anyone wants to make that argument, and that’s why I didn’t address it but instead just showed how it used to be believed.
I think the orthogonality thesis is wrong. For instance, without rejecting the orthogonality thesis, one might think we need to align the AI
So what’s the probability of a misaligned AI killing us? Is it close to 1.0? Is it enough to justify nuking chip fabs?
The OT doesn’t tell you—it doesn’t quantify probabilities.
if it’s a claim about the probability of certain (intelligence, goal) pairs occurring then it’s False;
Inasmuch as it doesn’t quantify probability, it is Useless.
rather than it automatically finding out the meaning of life using its superior intelligence.
An AI can non-automatically do something like that, ie. constitutional AI.Edit: The OT is a valid argument against a particularly strong form of let-the-AI-figure-it-out, but not against all forms.
The orthogonality thesis tells you that that thing where you attempt to do a logical proof that it is a good idea to make an AI is doomed to fail because whether it’s a good idea depends on the goals you give it. This sort of logical proof might seem absurd, but I refer you to E. Yudkowsky’s argument to show how prior to the popularization of the orthogonality thesis, it apparently seemed plausible to some people.
I’m not claiming that the orthogonality thesis is a knockdown argument wrt. everything, only that it’s important for directing the conversation towards productive topics like “what do we want the AI to do, and how will it’s goals become to do that?” rather than unproductive topics.
An AI can non-automatically do something like that, ie. constitutional AI.
Current constitutional AI doesn’t do any backchaining, and is therefore limited in potential.
It seems easy to imagine that one could expand it to do backchaining using chain-of-thought prompting etc.. In that case, the values of the AI would presumably be determined by human-like moral reasoning. The issue with human-like moral reasoning is that when humans do it, one tends to come up with wild and sketchy ideas like tiling the universe with hedonium (if utilitarian) or various other problematic things (if non-utilitarian). Given this track record, I’m not convinced constitutional AI scales to superintelligences.
The orthogonality thesis tells you that that thing where you attempt to do a logical proof that it is a good idea to make an AI is doomed to fail because whether it’s a good idea depends on the goals you give it.
It depends on whether it will be safe, in general. Not having goals,, or having corrigible goals are forms of safety—so it doesn’t all depend on the goals you initially give it.
t’s important for directing the conversation towards productive topics like “what do we want the AI to do, and how will it’s goals become to do that?”
As widely (mis)understood, it smuggles in the idea that AIs are necessarily goal driven, the goals are necessarily stable and incorrigible, etc. It’s not widely recognised that there is a wider orthogonality thesis, that Mindpsace also contains many combinations of capability and goal instability/stability. That weakens the MIRI/Yudkoiwsky argument that goal alignment has to be got right first time.
>’m not convinced constitutional AI scales to superintelligences.
You can obviously create AIs that don’t try to achieve anything in the world, and sometimes they are useful for various reasons, but some people who are trying to achieve things in the world find it to be a good idea to make AIs that also try to achieve things in the world, and the existence of these people is sufficient to create existential risk.
I think the sign flip would be:
If Nora Belrose wants to make that argument, then she can just do what.
Thanks to Eliezer Y. pushing the orthogonality thesis in rationalist circles, I don’t think anyone wants to make that argument, and that’s why I didn’t address it but instead just showed how it used to be believed.
So what’s the probability of a misaligned AI killing us? Is it close to 1.0? Is it enough to justify nuking chip fabs?
The OT doesn’t tell you—it doesn’t quantify probabilities.
Inasmuch as it doesn’t quantify probability, it is Useless.
An AI can non-automatically do something like that, ie. constitutional AI.Edit: The OT is a valid argument against a particularly strong form of let-the-AI-figure-it-out, but not against all forms.
The orthogonality thesis tells you that that thing where you attempt to do a logical proof that it is a good idea to make an AI is doomed to fail because whether it’s a good idea depends on the goals you give it. This sort of logical proof might seem absurd, but I refer you to E. Yudkowsky’s argument to show how prior to the popularization of the orthogonality thesis, it apparently seemed plausible to some people.
I’m not claiming that the orthogonality thesis is a knockdown argument wrt. everything, only that it’s important for directing the conversation towards productive topics like “what do we want the AI to do, and how will it’s goals become to do that?” rather than unproductive topics.
Current constitutional AI doesn’t do any backchaining, and is therefore limited in potential.
It seems easy to imagine that one could expand it to do backchaining using chain-of-thought prompting etc.. In that case, the values of the AI would presumably be determined by human-like moral reasoning. The issue with human-like moral reasoning is that when humans do it, one tends to come up with wild and sketchy ideas like tiling the universe with hedonium (if utilitarian) or various other problematic things (if non-utilitarian). Given this track record, I’m not convinced constitutional AI scales to superintelligences.
It depends on whether it will be safe, in general. Not having goals,, or having corrigible goals are forms of safety—so it doesn’t all depend on the goals you initially give it.
As widely (mis)understood, it smuggles in the idea that AIs are necessarily goal driven, the goals are necessarily stable and incorrigible, etc. It’s not widely recognised that there is a wider orthogonality thesis, that Mindpsace also contains many combinations of capability and goal instability/stability. That weakens the MIRI/Yudkoiwsky argument that goal alignment has to be got right first time.
>’m not convinced constitutional AI scales to superintelligences.
I’m not convinced that ASI will happen overnight.
You can obviously create AIs that don’t try to achieve anything in the world, and sometimes they are useful for various reasons, but some people who are trying to achieve things in the world find it to be a good idea to make AIs that also try to achieve things in the world, and the existence of these people is sufficient to create existential risk.
But it’s not the OT telling you that.
The orthogonality thesis is indeed not sufficient to derive everything of AI safety, but that doesn’t mean it’s trivial.