Thing is: a narrow AI that doesn’t model human minds and attempts to disrupt it’s strategies isn’t going to hide how it plans to do it.
So you build your narrow super-medicine-bot and ask it to plan out how it will achieve the goal you’ve given it and to provide a full walkthrough and description.
it’s not a general AI, it doesn’t have any programming for understanding lying or misleading anyone so it lays out the plan in full for the human operator. (why would it not?)
who promptly changes the criteria for success and tries again.
I could well be confused about this, but: if the AI “doesn’t model human minds” at all, how could it interpret the command to “provide a full walkthrough and description”?
Every statement an AI tells us will be a lie to some extent, simply in terms of being a simplification so that we can understand it. If we end up selecting against simplifications that reveal nefarious plans...
But the narrow AI I had above might not even be capable of lying—it might just simply spit out the drug design, with a list of estimated improvements according to the criteria it’s been given, without anyone ever realising that “reduced mortality” was code for “everyone’s dead already”.
I think this sums it up well. To my understanding, I think it would only require someone “looking over its shoulder”, asking its specific objective for each drug and the expected results of the drug. I doubt a “limited intelligence” would be able to lie. That is, unless it somehow mutated/accidentally became a more general AI, but then we’ve jumped rails into a different problem.
It’s possible that I’m paying too much attention to your example, and not enough attention to your general point. I guess the moral of the story is, though, “limited AI can still be dangerous if you don’t take proper precautions”, or “incautiously coded objectives can be just as dangerous in limited AI as in general AI”. Which I agree with, and is a good point.
Thing is: a narrow AI that doesn’t model human minds and attempts to disrupt it’s strategies isn’t going to hide how it plans to do it.
So you build your narrow super-medicine-bot and ask it to plan out how it will achieve the goal you’ve given it and to provide a full walkthrough and description.
it’s not a general AI, it doesn’t have any programming for understanding lying or misleading anyone so it lays out the plan in full for the human operator. (why would it not?)
who promptly changes the criteria for success and tries again.
I could well be confused about this, but: if the AI “doesn’t model human minds” at all, how could it interpret the command to “provide a full walkthrough and description”?
Until they stumble upon an AI that lies, possibly inadvertently, and then we’re dead...
But I do agree that general intelligence is more dangerous, it’s just that narrow intelligence isn’t harmless.
How do you convincingly lie without having the capability to think up a convincing lie?
Think you’re telling the truth.
Or be telling the truth, but be misinterpreted.
Every statement an AI tells us will be a lie to some extent, simply in terms of being a simplification so that we can understand it. If we end up selecting against simplifications that reveal nefarious plans...
But the narrow AI I had above might not even be capable of lying—it might just simply spit out the drug design, with a list of estimated improvements according to the criteria it’s been given, without anyone ever realising that “reduced mortality” was code for “everyone’s dead already”.
Not so. You can definitely ask questions about complicated things that have simple answers.
Yes, that was an exaggeration—I was thinking of most real-world questions.
I was thinking of most real-world questions that aren’t of the form ‘Why X?’ or ‘How do I X?’.
“How much/many X?” → number
“When will X?” → number
“Is X?” → boolean
“What are the chances of X if I Y?” → number
Also, any answer that simplifies isn’t a lie if its simplified status is made clear.
I think this sums it up well. To my understanding, I think it would only require someone “looking over its shoulder”, asking its specific objective for each drug and the expected results of the drug. I doubt a “limited intelligence” would be able to lie. That is, unless it somehow mutated/accidentally became a more general AI, but then we’ve jumped rails into a different problem.
It’s possible that I’m paying too much attention to your example, and not enough attention to your general point. I guess the moral of the story is, though, “limited AI can still be dangerous if you don’t take proper precautions”, or “incautiously coded objectives can be just as dangerous in limited AI as in general AI”. Which I agree with, and is a good point.