Does agency enable the AI to do so? If not, then why wouldn’t a human being not be able to do the same by using the AI in tool mode?
Yeah, I realized this while writing the comment: “(Or another human with superintelligent tools could do this as well.)” So this isn’t a risk with self-willed AI per se. But note this actually makes my original point stronger, since I was arguing against the idea that progress on AI is safe as long as it doesn’t have a “will” to act in the real world.
Just make it list equally convincing counter-arguments.
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at? What does “equally convincing” mean? An argument that exactly counteracts the one that you’re reading, leaving your mind unchanged?
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at?
Sure, why not? I think IBM is actually planning to do this with IBM Watson. Once mobile phones become fast enough you can receive constant feedback about ideas and arguments you encounter.
For example, some commercial tells you that you can lose 10 pounds in 1 day by taking a pill. You then either ask your “IBM Oracle” or have it set up to give you automatic feedback. It will then tell you that there are no studies that indicate that something as advertised is possible and that it won’t be healthy anyway. Or something along those lines.
I believe that in future it will be possible to augment everything with fact-check annotations.
But that’s besides the point. The idea was that if you run the AI box experiment with Eliezer posing as malicious AI trying to convince the gatekeeper to let it out of the box, and at the same time as a question answering tool using the same algorithms as the AI, then I don’t think someone would let him out of the box. He would basically have to destroy his own arguments by giving unbiased answers about the trustworthiness of the boxed agent and possible consequences of letting it out of the box. At the very best the AI in agent mode would have to contradict the tool mode version and thereby reveal that it is dishonest and not trustworthy.
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at?
Sure, why not?
When I’m feeling down and my mom sends me an email trying to cheer me up, that’ll be a bit of a bummer.
Yeah, I realized this while writing the comment: “(Or another human with superintelligent tools could do this as well.)” So this isn’t a risk with self-willed AI per se. But note this actually makes my original point stronger, since I was arguing against the idea that progress on AI is safe as long as it doesn’t have a “will” to act in the real world.
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at? What does “equally convincing” mean? An argument that exactly counteracts the one that you’re reading, leaving your mind unchanged?
Sure, why not? I think IBM is actually planning to do this with IBM Watson. Once mobile phones become fast enough you can receive constant feedback about ideas and arguments you encounter.
For example, some commercial tells you that you can lose 10 pounds in 1 day by taking a pill. You then either ask your “IBM Oracle” or have it set up to give you automatic feedback. It will then tell you that there are no studies that indicate that something as advertised is possible and that it won’t be healthy anyway. Or something along those lines.
I believe that in future it will be possible to augment everything with fact-check annotations.
But that’s besides the point. The idea was that if you run the AI box experiment with Eliezer posing as malicious AI trying to convince the gatekeeper to let it out of the box, and at the same time as a question answering tool using the same algorithms as the AI, then I don’t think someone would let him out of the box. He would basically have to destroy his own arguments by giving unbiased answers about the trustworthiness of the boxed agent and possible consequences of letting it out of the box. At the very best the AI in agent mode would have to contradict the tool mode version and thereby reveal that it is dishonest and not trustworthy.
When I’m feeling down and my mom sends me an email trying to cheer me up, that’ll be a bit of a bummer.