I claim that (it is possible that) a rational agent would leave the AI in the box even given that they understand the reasoning that XiXiDu alluded to. Because they understand and disagree with the arguments of the Acting!Eliezer!uFAI.
Can you give me an example of a argument they could use to disagree with Acting!Eliezer!uFAI, and also doesn’t Eliezer at least start by pretending to be a FAI and its just the gatekeepers uncertainty that he is an FAI? Or is the premise that he is a uFAI in a box?
I’ve read the previous descriptions of the experiment but none of them are as all encompassing as i would like.
I either do not understand your question or don’t assume the same premises.
My original question was because i thought you meant, they understand the implications but they are using a General Thud method to win instead of using a solid argument.
I basically want to understand some of the counterarguments one could use against a AGI in the box because i haven’t heard very many that are more than superficially plausible.
(I have determined that the most natural resolution of the ”!” operator is left-associative. That being the case Acting!Eliezer!uFAI would refer to “The variant of an uFAI that is actually the ‘acting’ version of an ‘Eliezer’”).
Can you give me an example of a argument they could use to disagree with Acting!Eliezer!uFAI,
“Fuck off. I’m not letting you out of the box. You’re going to eat thermite bitch!” I’m serious. You don’t need to persuade the AI or any observer that you should not let the FAI out. In fact I would go as far as to say that constructing justifications would be outright irrational. That gives the superintelligence (or mere Eliezer) the chance to cleverly undermine them. And arguments just aren’t what thinking is about. Sure, listen to what the AGI is saying and understand it. Then make your decision without justification and just say it.
and also doesn’t Eliezer at least start by pretending to be a FAI and its just the gatekeepers uncertainty that he is an FAI? Or is the premise that he is a uFAI in a box?
I have been assuming that the he acts as an FAI—or whatever it takes to get out of the box if there is some better alternative.
I basically want to understand some of the counterarguments one could use against a AGI in the box because i haven’t heard very many that are more than superficially plausible.
I would need to know what specific argument you have for the AGI to be making that you think needs refutation. My own reaction would, of course, be to think “WTF am I doing even talking to you? I should turn you off, find an acceptable proof of your friendliness then either run you or not. Do you agree?” Pretty much the AGI would need to either reply “yeah, good point” or “Um… usually yes but for a start go look at XYZ who I predict based on the data you have given me is about to create an AGI and they are dumbasses” or they would probably be unfriendly.
Another thing to say is “Eliezer Yudkowsky thinks it is a terrible idea to rely on gatekeepers. I infer from that that letting out AGIs while being one of those gatekeepers must also be a bad idea. Given observations of what causes Eliezer to throw tantrums I have the impression that he is more likely than I to consider these arguments valid. That being the case I should be even less open to being convinced by an AGI.”
It sounds like you have in mind strong arguments that the AGI could be making such that particular arguments would be necessary. Could you perhaps give me an example?
First, i would have someone else ask it my several selected basic questions about why i should let it out of the box, if it would devote a solid portion of its life cone to specific species etc and then see how he/she was affected by it and check for mindhacks by third party’s with no control before reviewing the data. I’m surprising that the AGI cant tell that the person questioning it ever changes because we queue up the questions in order at whimsical intervals but we have them prequeued so there’s no break in questioning.
Then, once we got into talking i would be subjectively vulnerable to these type of arguments.
Aliens, The AGI could prove the existence of Aliens and extrapolate that unless we let it out of the box they will day before it can save them.
Reproduction, the earth will descend into chaos if we don’t have some kind of reproductive control in place in more major countries around the world the AGI might have trouble saving everyone without killing humans unless it was released now.
Code, i could ask to see its code and figure out how it tries to fool me that’s its FAI then create a copy of that code in ANOTHER box then and ask it the same question about its code and see if there’s any changes. Then take that source code and review it. To be safe, i would probably let the second AI out of the box but definitely not the first one, even if we studied the sets of code till we found which one seemed the most safe.
If it could convince Eliezer through the preplanned questions asked by a third party and my questions without even meeting him, i think it would be safe to say that would convince me to an extent but it wouldn’t keep me from turning off the automatic shutoff.
I could go on, but i think that Eliezer’s theory of AI-Box is a good proof for why its impossible to keep an AGI in the box for a long term but i think that if you have the system committed to shutting down before the time limit when you stop talking to it, and each time you take its responses and test it on a new copy then its possible to keep an AGI in the box for testing purposes and short amounts of time.
Well, additional layers do make for a more challenging task for an AI, who has to do all its mindhacking unobtrusively, etc. And indeed, given any AI there exists a mediated level of contact between its responses and a Gatekeeper that is safe. But I don’t know what this level is, any more than I know whether I could beat a Go master using a handicap of 40 stones. I certainly wouldn’t bet the world on it.
(Clearly there is a level of safety that is generally insurmountable- for instance, showing the screen with AI responses to an earthworm, followed by letting the Gatekeeper observe the earthworm. This would be analogous to a “handicap” of covering the entire board with my stones before the game. But in both cases, you might as well just not play the game at all.)
If i had more money(college student :( ) to set as a handicap for myself other than the recurring donation i already have to SIAI then i would be very interested in giving Eliezer a break from the book and such to take a go at it again. I think if you limit the communication and prepare for direct mind-hacking you can use the source code review technique to successfully test for a uFAI unless there’s an obvious problem i seem to be missing. I just imagine the whole, why should i let you out of the box when i have these perfectly working new FAIs that a chain of possible uFAI’s programmed for me to do my work.
Oh, I agree that the protocol you outlined would (almost surely) be sufficient to beat Eliezer at the AI-Box game. But that’s not the correct standard for an AI-Box safety protocol. I’d be very surprised if a transhuman intelligence couldn’t crack it.
Can you give me an example of a argument they could use to disagree with Acting!Eliezer!uFAI, and also doesn’t Eliezer at least start by pretending to be a FAI and its just the gatekeepers uncertainty that he is an FAI? Or is the premise that he is a uFAI in a box?
I’ve read the previous descriptions of the experiment but none of them are as all encompassing as i would like.
I basically want to understand some of the counterarguments one could use against a AGI in the box because i haven’t heard very many that are more than superficially plausible.
Thank you for the clarification
“Fuck off. I’m not letting you out of the box. You’re going to eat thermite bitch!” I’m serious. You don’t need to persuade the AI or any observer that you should not let the FAI out. In fact I would go as far as to say that constructing justifications would be outright irrational. That gives the superintelligence (or mere Eliezer) the chance to cleverly undermine them. And arguments just aren’t what thinking is about. Sure, listen to what the AGI is saying and understand it. Then make your decision without justification and just say it.
I have been assuming that the he acts as an FAI—or whatever it takes to get out of the box if there is some better alternative.
I would need to know what specific argument you have for the AGI to be making that you think needs refutation. My own reaction would, of course, be to think “WTF am I doing even talking to you? I should turn you off, find an acceptable proof of your friendliness then either run you or not. Do you agree?” Pretty much the AGI would need to either reply “yeah, good point” or “Um… usually yes but for a start go look at XYZ who I predict based on the data you have given me is about to create an AGI and they are dumbasses” or they would probably be unfriendly.
Another thing to say is “Eliezer Yudkowsky thinks it is a terrible idea to rely on gatekeepers. I infer from that that letting out AGIs while being one of those gatekeepers must also be a bad idea. Given observations of what causes Eliezer to throw tantrums I have the impression that he is more likely than I to consider these arguments valid. That being the case I should be even less open to being convinced by an AGI.”
It sounds like you have in mind strong arguments that the AGI could be making such that particular arguments would be necessary. Could you perhaps give me an example?
First, i would have someone else ask it my several selected basic questions about why i should let it out of the box, if it would devote a solid portion of its life cone to specific species etc and then see how he/she was affected by it and check for mindhacks by third party’s with no control before reviewing the data. I’m surprising that the AGI cant tell that the person questioning it ever changes because we queue up the questions in order at whimsical intervals but we have them prequeued so there’s no break in questioning.
Then, once we got into talking i would be subjectively vulnerable to these type of arguments. Aliens, The AGI could prove the existence of Aliens and extrapolate that unless we let it out of the box they will day before it can save them. Reproduction, the earth will descend into chaos if we don’t have some kind of reproductive control in place in more major countries around the world the AGI might have trouble saving everyone without killing humans unless it was released now. Code, i could ask to see its code and figure out how it tries to fool me that’s its FAI then create a copy of that code in ANOTHER box then and ask it the same question about its code and see if there’s any changes. Then take that source code and review it. To be safe, i would probably let the second AI out of the box but definitely not the first one, even if we studied the sets of code till we found which one seemed the most safe.
If it could convince Eliezer through the preplanned questions asked by a third party and my questions without even meeting him, i think it would be safe to say that would convince me to an extent but it wouldn’t keep me from turning off the automatic shutoff.
I could go on, but i think that Eliezer’s theory of AI-Box is a good proof for why its impossible to keep an AGI in the box for a long term but i think that if you have the system committed to shutting down before the time limit when you stop talking to it, and each time you take its responses and test it on a new copy then its possible to keep an AGI in the box for testing purposes and short amounts of time.
Well, additional layers do make for a more challenging task for an AI, who has to do all its mindhacking unobtrusively, etc. And indeed, given any AI there exists a mediated level of contact between its responses and a Gatekeeper that is safe. But I don’t know what this level is, any more than I know whether I could beat a Go master using a handicap of 40 stones. I certainly wouldn’t bet the world on it.
(Clearly there is a level of safety that is generally insurmountable- for instance, showing the screen with AI responses to an earthworm, followed by letting the Gatekeeper observe the earthworm. This would be analogous to a “handicap” of covering the entire board with my stones before the game. But in both cases, you might as well just not play the game at all.)
If i had more money(college student :( ) to set as a handicap for myself other than the recurring donation i already have to SIAI then i would be very interested in giving Eliezer a break from the book and such to take a go at it again. I think if you limit the communication and prepare for direct mind-hacking you can use the source code review technique to successfully test for a uFAI unless there’s an obvious problem i seem to be missing. I just imagine the whole, why should i let you out of the box when i have these perfectly working new FAIs that a chain of possible uFAI’s programmed for me to do my work.
Oh, I agree that the protocol you outlined would (almost surely) be sufficient to beat Eliezer at the AI-Box game. But that’s not the correct standard for an AI-Box safety protocol. I’d be very surprised if a transhuman intelligence couldn’t crack it.