I’m kind of dubious that you needed ‘beware of destroying mankind’ in a physics textbook to get Teller to check if nuke can cause thermonuclear ignition in atmosphere or seawater, but if it is there, I guess it won’t hurt.
Here’s another reason why I don’t like “AI risk”: it brings to mind analogies like physics catastrophes or astronomical disasters, and lets AI researchers think that their work is ok as long as they have little chance of immediately destroying Earth. But the real problem is how do we build or become a superintelligence that shares our values, and given this seems very difficult, any progress that doesn’t contribute to the solution but brings forward the date by which we must solve it (or be stuck with something very suboptimal even if it doesn’t kill us) is bad, and this includes AI progress that is not immediately dangerous.
Well, there’s this implied assumption that super-intelligence that ‘does not share our values’ shares our domain of definition of the values. I can make a fairly intelligent proof generator, far beyond human capability if given enough CPU time; it won’t share any values with me, not even the domain of applicability; the lack of shared values with it is so profound as to make it not do anything whatsoever in the ‘real world’ that I am concerned with. Even if it was meta—strategic to the point of potential for e.g. search for ways to hack into a mainframe to gain extra resources to do the task ‘sooner’ by wallclock time, it seems very dubious that by mere accident it will have proper symbol grounding, won’t wirelead (i.e. would privilege the solutions that don’t involve just stopping said clock), etc etc. Same goes for other practical AIs, even the evil ones that would e.g. try to take over internet.
You’re still falling into the same trap, thinking that your work is ok as long as it doesn’t immediately destroy the Earth. What if someone takes your proof generator design, and uses the ideas to build something that does affect the real world?
You’re still falling into the same trap, thinking that your work is ok as long as it doesn’t immediately destroy the Earth. What if someone takes your proof generator design, and uses the ideas to build something that does affect the real world?
Well let’s say in 2022 we have a bunch of tools along the lines of automatic problem solving, unburdened by their own will (not because they were so designed but by simple omission of immense counter productive effort). Someone with a bad idea comes around, downloads some open source software, cobbles together some self propelling ‘thing’ that is ‘vastly superhuman’ circa 2012. Keep in mind that we still have our tools that make us ‘vastly superhuman’ circa 2012 , and i frankly don’t see how ‘automatic will’, for lack of better term, is contributing anything here that would make the fully automated system competitive.
Well, one thing the self-willed superintelligent AI could do is read your writings, form a model of you, and figure out a string of arguments designed to persuade you to give up your own goals in favor of its goals (or just trick you into doing things that further its goals without realizing it). (Or another human with superintelligent tools could do this as well.) Can you ask your “automatic problem solving tools” to solve the problem of defending against this, while not freezing your mind so that you can no longer make genuine moral/philosophical progress? If you can do this, then you’ve pretty much already solved the FAI problem, and you might as well ask the “tools” to tell you how to build an FAI.
Well, one thing the self-willed superintelligent AI could do is read your writings, form a model of you, and figure out a string of arguments designed to persuade you to give up your own goals in favor of its goals...
Does agency enable the AI to do so? If not, then why wouldn’t a human being not be able to do the same by using the AI in tool mode?
Can you ask your “automatic problem solving tools” to solve the problem of defending against this...
Just make it list equally convincing counter-arguments.
Does agency enable the AI to do so? If not, then why wouldn’t a human being not be able to do the same by using the AI in tool mode?
Yeah, I realized this while writing the comment: “(Or another human with superintelligent tools could do this as well.)” So this isn’t a risk with self-willed AI per se. But note this actually makes my original point stronger, since I was arguing against the idea that progress on AI is safe as long as it doesn’t have a “will” to act in the real world.
Just make it list equally convincing counter-arguments.
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at? What does “equally convincing” mean? An argument that exactly counteracts the one that you’re reading, leaving your mind unchanged?
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at?
Sure, why not? I think IBM is actually planning to do this with IBM Watson. Once mobile phones become fast enough you can receive constant feedback about ideas and arguments you encounter.
For example, some commercial tells you that you can lose 10 pounds in 1 day by taking a pill. You then either ask your “IBM Oracle” or have it set up to give you automatic feedback. It will then tell you that there are no studies that indicate that something as advertised is possible and that it won’t be healthy anyway. Or something along those lines.
I believe that in future it will be possible to augment everything with fact-check annotations.
But that’s besides the point. The idea was that if you run the AI box experiment with Eliezer posing as malicious AI trying to convince the gatekeeper to let it out of the box, and at the same time as a question answering tool using the same algorithms as the AI, then I don’t think someone would let him out of the box. He would basically have to destroy his own arguments by giving unbiased answers about the trustworthiness of the boxed agent and possible consequences of letting it out of the box. At the very best the AI in agent mode would have to contradict the tool mode version and thereby reveal that it is dishonest and not trustworthy.
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at?
Sure, why not?
When I’m feeling down and my mom sends me an email trying to cheer me up, that’ll be a bit of a bummer.
Yep. Majorly awesome scenario degrades into ads vs adblock when you consider everything in the future not just the self willed robot. Matter of fact, a lot of work is put into constructing convincing strings of audio and visual stimuli, and into ignoring those strings.
Superstimuli and the Collapse of Western Civilization. Using such skills to manipulate other humans appears to be what we grew intelligence for, of course. As I note, western civilisation is already basically made of the most virulent toxic memes we can come up with. In the noble causes of selling toothpaste and car insurance and, of course, getting laid. It seems to be what we do now we’ve more or less solved the food and shelter problems.
They’d probably have to be more convincing, since convincing a human being out of a position they already hold is usually a more difficult task than convincing them to hold the position in the first place.
They’d probably have to be more convincing, since convincing a human being out of a position they already hold is usually a more difficult task than convincing them to hold the position in the first place.
If I have a superhuman answering machine on one side. A tool that just lists a number of answers to my query, just like a superhuman Google. And on the other side I have the same tool in agent mode. Then why would I be more convinced by the agent mode output?
An agent has an incentive to trick me. While the same algorithm, minus the agency module, will just output unbiased answers to my queries.
If the answers between the tool and agent mode differ, then I naturally believe the tool mode output.
If for example the agent mode was going to drivel something about acausal trade and the tool mode would just output some post by Eliezer Yudkowsky explaining why I shouldn’t let the AI out of the box, then how could the agent mode possible be more convincing? Especially since putting the answering algorithm into agent mode shouldn’t improve the answers.
why would I be more convinced by the agent mode output?
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
why would I be more convinced by the agent mode output?
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
...though that’s completely tangential to the comment you replied to.
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
I mean, we shouldn’t assume an idiot gatekeeper who never heard of anything we’re talking about here. So the idea that an AI in agent mode could brainwash someone to the extent that it afterwards takes even stronger arguments to undo it seems rather far-fetched (ETA What’s it supposed to say? That the tool uses the same algorithms as itself but is somehow wrong in claiming that the AI in agent mode tries to brainwash the gatekeeper?).
The idea is that given a sufficiently strong AI in tool mode, it might be possible to counter any attempt to trick a gatekeeper. And in the case that the tool mode agrees, then it probably is a good idea to let the AI out of the box. Although anyone familiar with the scenario would probably rather assume a systematic error elsewhere, e.g. a misinterpretation of one’s questions by the AI in tool mode.
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
Ah, I see. Sure, OK, that’s apposite. Thanks for clarifying that.
Keep in mind that we still have our tools that make us ‘vastly superhuman’ circa 2012 , and i frankly don’t see how ‘automatic will’, for lack of better term, is contributing anything here that would make the fully automated system competitive.
This is actually one of Greg Egan’s major objections. That superhuman tools come first and that artificial agency won’t make those tools competitive against augmented humans. Further, you can’t apply any work done to ensure that an artificial agents is friendly to augmented humans.
I’m kind of dubious that you needed ‘beware of destroying mankind’ in a physics textbook to get Teller to check if nuke can cause thermonuclear ignition in atmosphere or seawater, but if it is there, I guess it won’t hurt.
Here’s another reason why I don’t like “AI risk”: it brings to mind analogies like physics catastrophes or astronomical disasters, and lets AI researchers think that their work is ok as long as they have little chance of immediately destroying Earth. But the real problem is how do we build or become a superintelligence that shares our values, and given this seems very difficult, any progress that doesn’t contribute to the solution but brings forward the date by which we must solve it (or be stuck with something very suboptimal even if it doesn’t kill us) is bad, and this includes AI progress that is not immediately dangerous.
ETA: I expanded this comment into a post here.
Well, there’s this implied assumption that super-intelligence that ‘does not share our values’ shares our domain of definition of the values. I can make a fairly intelligent proof generator, far beyond human capability if given enough CPU time; it won’t share any values with me, not even the domain of applicability; the lack of shared values with it is so profound as to make it not do anything whatsoever in the ‘real world’ that I am concerned with. Even if it was meta—strategic to the point of potential for e.g. search for ways to hack into a mainframe to gain extra resources to do the task ‘sooner’ by wallclock time, it seems very dubious that by mere accident it will have proper symbol grounding, won’t wirelead (i.e. would privilege the solutions that don’t involve just stopping said clock), etc etc. Same goes for other practical AIs, even the evil ones that would e.g. try to take over internet.
You’re still falling into the same trap, thinking that your work is ok as long as it doesn’t immediately destroy the Earth. What if someone takes your proof generator design, and uses the ideas to build something that does affect the real world?
Well let’s say in 2022 we have a bunch of tools along the lines of automatic problem solving, unburdened by their own will (not because they were so designed but by simple omission of immense counter productive effort). Someone with a bad idea comes around, downloads some open source software, cobbles together some self propelling ‘thing’ that is ‘vastly superhuman’ circa 2012. Keep in mind that we still have our tools that make us ‘vastly superhuman’ circa 2012 , and i frankly don’t see how ‘automatic will’, for lack of better term, is contributing anything here that would make the fully automated system competitive.
Well, one thing the self-willed superintelligent AI could do is read your writings, form a model of you, and figure out a string of arguments designed to persuade you to give up your own goals in favor of its goals (or just trick you into doing things that further its goals without realizing it). (Or another human with superintelligent tools could do this as well.) Can you ask your “automatic problem solving tools” to solve the problem of defending against this, while not freezing your mind so that you can no longer make genuine moral/philosophical progress? If you can do this, then you’ve pretty much already solved the FAI problem, and you might as well ask the “tools” to tell you how to build an FAI.
Does agency enable the AI to do so? If not, then why wouldn’t a human being not be able to do the same by using the AI in tool mode?
Just make it list equally convincing counter-arguments.
Yeah, I realized this while writing the comment: “(Or another human with superintelligent tools could do this as well.)” So this isn’t a risk with self-willed AI per se. But note this actually makes my original point stronger, since I was arguing against the idea that progress on AI is safe as long as it doesn’t have a “will” to act in the real world.
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at? What does “equally convincing” mean? An argument that exactly counteracts the one that you’re reading, leaving your mind unchanged?
Sure, why not? I think IBM is actually planning to do this with IBM Watson. Once mobile phones become fast enough you can receive constant feedback about ideas and arguments you encounter.
For example, some commercial tells you that you can lose 10 pounds in 1 day by taking a pill. You then either ask your “IBM Oracle” or have it set up to give you automatic feedback. It will then tell you that there are no studies that indicate that something as advertised is possible and that it won’t be healthy anyway. Or something along those lines.
I believe that in future it will be possible to augment everything with fact-check annotations.
But that’s besides the point. The idea was that if you run the AI box experiment with Eliezer posing as malicious AI trying to convince the gatekeeper to let it out of the box, and at the same time as a question answering tool using the same algorithms as the AI, then I don’t think someone would let him out of the box. He would basically have to destroy his own arguments by giving unbiased answers about the trustworthiness of the boxed agent and possible consequences of letting it out of the box. At the very best the AI in agent mode would have to contradict the tool mode version and thereby reveal that it is dishonest and not trustworthy.
When I’m feeling down and my mom sends me an email trying to cheer me up, that’ll be a bit of a bummer.
Yep. Majorly awesome scenario degrades into ads vs adblock when you consider everything in the future not just the self willed robot. Matter of fact, a lot of work is put into constructing convincing strings of audio and visual stimuli, and into ignoring those strings.
Superstimuli and the Collapse of Western Civilization. Using such skills to manipulate other humans appears to be what we grew intelligence for, of course. As I note, western civilisation is already basically made of the most virulent toxic memes we can come up with. In the noble causes of selling toothpaste and car insurance and, of course, getting laid. It seems to be what we do now we’ve more or less solved the food and shelter problems.
They’d probably have to be more convincing, since convincing a human being out of a position they already hold is usually a more difficult task than convincing them to hold the position in the first place.
If I have a superhuman answering machine on one side. A tool that just lists a number of answers to my query, just like a superhuman Google. And on the other side I have the same tool in agent mode. Then why would I be more convinced by the agent mode output?
An agent has an incentive to trick me. While the same algorithm, minus the agency module, will just output unbiased answers to my queries.
If the answers between the tool and agent mode differ, then I naturally believe the tool mode output.
If for example the agent mode was going to drivel something about acausal trade and the tool mode would just output some post by Eliezer Yudkowsky explaining why I shouldn’t let the AI out of the box, then how could the agent mode possible be more convincing? Especially since putting the answering algorithm into agent mode shouldn’t improve the answers.
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
I mean, we shouldn’t assume an idiot gatekeeper who never heard of anything we’re talking about here. So the idea that an AI in agent mode could brainwash someone to the extent that it afterwards takes even stronger arguments to undo it seems rather far-fetched (ETA What’s it supposed to say? That the tool uses the same algorithms as itself but is somehow wrong in claiming that the AI in agent mode tries to brainwash the gatekeeper?).
The idea is that given a sufficiently strong AI in tool mode, it might be possible to counter any attempt to trick a gatekeeper. And in the case that the tool mode agrees, then it probably is a good idea to let the AI out of the box. Although anyone familiar with the scenario would probably rather assume a systematic error elsewhere, e.g. a misinterpretation of one’s questions by the AI in tool mode.
Ah, I see. Sure, OK, that’s apposite. Thanks for clarifying that.
I disagree with your prediction.
This is actually one of Greg Egan’s major objections. That superhuman tools come first and that artificial agency won’t make those tools competitive against augmented humans. Further, you can’t apply any work done to ensure that an artificial agents is friendly to augmented humans.