Nah, likelihood of torture is real low. Most likely causes are an accidental sign flip in a reward function, or some sort of deliberate terrorism.
Although an accidental sign flip in a reward function has indeed happened to people in the past, you just need some basic inspection and safeguarding of the reward function (or steering vector, or learned reward model, or dynamic representation of human moral reasoning) to drive the already-low probability of this and related errors down by orders of magnitude. This is something we’re on track to have handled.
And even cartoon terrorists probably have better things to do with their cutting edge AI they have for some reason.
Nah, likelihood of torture is real low. Most likely causes are an accidental sign flip in a reward function, or some sort of deliberate terrorism.
Although an accidental sign flip in a reward function has indeed happened to people in the past, you just need some basic inspection and safeguarding of the reward function (or steering vector, or learned reward model, or dynamic representation of human moral reasoning) to drive the already-low probability of this and related errors down by orders of magnitude. This is something we’re on track to have handled.
And even cartoon terrorists probably have better things to do with their cutting edge AI they have for some reason.
What do you think the likelihood of extinction is?
20% maybe? I’m feeling optimistic today.