“We don’t consider scientists dangerous” because “we think they don’t want to destroy the world”.
Since we think scientists are friendly, we trust them more than we should trust an Oracle AI. There’s also the fact that an unfriendly AI presumably can fool us better than a scientist can.
It produces a dangerous protein inadvertently, in the way that science might...or it has a higher-than-science probability of producing a dagnerous protein, due to some unfriendly intent?
Mostly the latter. However, even the former can be worse than science now, in that “don’t destroy the world” is not an implicit goal. So a scientist noticing that something is dangerous might not develop it, while an AI might not have such restrictions.
I am saying that recognising the hidden dangers in the output form the Oracle room is fundamentally different from recognising the hidden dangers in the output from the science room, which we are doing already.
I don’t see how you can assert without knowing anything about the type of Oracle AI.
We can presume that a scientist wants to still exist, and hence doesn’t want to destroy the world. This seems much stronger than a presumption that an Oracle AI will be safe. Of course, an AI might be safe, and a scientist might be out to get us; but the balance of probability says otherwise.
I’m not asserting that every AI is dangerous and every scientist is safe.
Ditto.
An AI can fool us better simply because it’s smarter (by assumption).
Why would a non-agentive , non-goal-driven AI want to fools us? Where would it get the motivation from?
I still think you’re using “non-agent” as magical thinking.
Here we’re talking in context of what you said above:
Is the Oracle AI thinking about the consequences of answering the questions you give it? Does the Oracle AI care about those consequences the same way you do, applying all the same values, to warn you if anything of value is lost?
No and no. But that doesn’t make an oracle dangerous in the way that MIRI’s standard superintelligent AI is.
So let’s say the Oracle AI decides that X best answers our question. But if it tell us X, we won’t accept it. If the Oracle cares that we adopt X, it might answer Y, which does the same as X but looks more appealing.
Or more subtly, if the AI comes up with Y, it might not tell us that it causes X, because it doesn’t care that X doesn’t fulfil our values, whereas a scientist would note all the implications.
But then people would know that the AI’s output hasn’t been filtered by a human’s common sense.
If humans are incapable of recognizing whether the plan is dangerous or not, it doesn’t matter how much scrutiny they put it through, they won’t be able to discern the danger.
We can presume that a scientist wants to still exist, and hence doesn’t want to destroy the world. This seems much stronger than a presumption that an Oracle AI will be safe. Of course, an AI might be safe, and a scientist might be out to get us; but the balance of probability says otherwise.
You don’t have any evidence that AIs are generally dangerous (since we have AIs and the empirical evidence is that they are not), and you don’t have a basis for theorising that Oracles are dangerous, because there are a number of different kinds of oracle.
An AI can fool us better simply because it’s smarter (by assumption).
So are out current AIs fooling is? We build them because they are better than us at specific things, but that doesn’t give them the motivation or the ability to fool us. Smartness isn’t a single one-size-all thing and AIs aren’t uniform in their abilities an properties. Once you shed those two illusions, you can see much easier methods of AI safety than those put forward by MIRI.
I still think you’re using “non-agent” as magical thinking.
I still think that if you can build it, it isn’t magic.
But if it tell us X, we won’t accept it. If the Oracle cares that we adopt X, it might answer Y, which does the same as X but looks more appealing.
A narrowly defined AI won’t “care” about anything except answering questions, so it won’t try to second guess us.
Or more subtly, if the AI comes up with Y, it might not tell us that it causes X, because it doesn’t care that X doesn’t fulfil our values, whereas a scientist would note all the implications.
I have dealt with that objection several times. People know that when you use databases and search engines, they don’t fully contextualise things, and the user of the information therefore has to exercise caution.
If humans are incapable of recognizing whether the plan is dangerous or not, it doesn’t matter how much scrutiny they put it through, they won’t be able to discern the danger.
That’s an only-perfection-will-do objection. Of course, humans can’t perfectly scrutinise scientific discovery, etc, so that changes nothing.
Statement should be read
Since we think scientists are friendly, we trust them more than we should trust an Oracle AI. There’s also the fact that an unfriendly AI presumably can fool us better than a scientist can.
Mostly the latter. However, even the former can be worse than science now, in that “don’t destroy the world” is not an implicit goal. So a scientist noticing that something is dangerous might not develop it, while an AI might not have such restrictions.
Are you missing a negative now?
I don’t see how you can assert without knowing anything about the type of Oracle AI.
Ditto.
Why would a non-agentive , non-goal-driven AI want to fools us? Where would it get the motivation from?
How could an AI with no knowledge of psychology fool us? Where would it get the knowledge from?
But then people would know that the AI’s output hasn’t been filtered by a human’s common sense.
Yes. Irony strikes again.
We can presume that a scientist wants to still exist, and hence doesn’t want to destroy the world. This seems much stronger than a presumption that an Oracle AI will be safe. Of course, an AI might be safe, and a scientist might be out to get us; but the balance of probability says otherwise.
I’m not asserting that every AI is dangerous and every scientist is safe.
An AI can fool us better simply because it’s smarter (by assumption).
I still think you’re using “non-agent” as magical thinking.
Here we’re talking in context of what you said above:
So let’s say the Oracle AI decides that X best answers our question. But if it tell us X, we won’t accept it. If the Oracle cares that we adopt X, it might answer Y, which does the same as X but looks more appealing.
Or more subtly, if the AI comes up with Y, it might not tell us that it causes X, because it doesn’t care that X doesn’t fulfil our values, whereas a scientist would note all the implications.
If humans are incapable of recognizing whether the plan is dangerous or not, it doesn’t matter how much scrutiny they put it through, they won’t be able to discern the danger.
You don’t have any evidence that AIs are generally dangerous (since we have AIs and the empirical evidence is that they are not), and you don’t have a basis for theorising that Oracles are dangerous, because there are a number of different kinds of oracle.
So are out current AIs fooling is? We build them because they are better than us at specific things, but that doesn’t give them the motivation or the ability to fool us. Smartness isn’t a single one-size-all thing and AIs aren’t uniform in their abilities an properties. Once you shed those two illusions, you can see much easier methods of AI safety than those put forward by MIRI.
I still think that if you can build it, it isn’t magic.
A narrowly defined AI won’t “care” about anything except answering questions, so it won’t try to second guess us.
I have dealt with that objection several times. People know that when you use databases and search engines, they don’t fully contextualise things, and the user of the information therefore has to exercise caution.
That’s an only-perfection-will-do objection. Of course, humans can’t perfectly scrutinise scientific discovery, etc, so that changes nothing.