This seems like a quick way to make money for CFAR/SI. After all, there are plenty of rich people around who would consider your proposal a guaranteed win for them, regardless of the stakes: “You mean I can say “I win” at any point and win the challenge? What’s the catch?”
Yeah, they’d both lack background knowledge to RP the conversation and would also, I presume, be much less willing to lose the money than if they’d ventured the bet themselves. Higher-stakes games are hard enough already (I was 1 for 3 on those when I called a halt). And if it did work against that demographic with unsolicited requests (which would surprise me) then there would be, cough, certain ethical issues.
I was the 1 success out of 3, preceding the two losses. I went into it with an intention of being indifferent to the stakes, driven by interest in seeing the methods. I think you couldn’t win against anyone with a meaningful outside-of-game motive to win (for money or for status), and you got overconfident after playing with me, leading you to accept the other >$10 challenges and lose.
So I would bet against you winning any random high-stakes (including people who go in eager to report that they won for internet cred, but not people who had put the money in escrow or the equivalent) game, and expect a non-decent success rate for this:
(I haven’t played this one but would give myself a decent chance of winning, against a Gatekeeper who thinks they could keep a superhuman AI inside a box, if anyone offered me sufficiently huge stakes to make me play the game ever again.)
So you are basically saying that you didn’t take the game seriously.
Even if your actual stakes were low, you should have played the role of a gatekeeper assigned to the task of guarding a potentially dangerous AI. Therefore, you player character should have had very high stakes.
No, high in-game stakes does not mean high out of game stakes.
In game, the gatekeeper could be convinced that it would be worth it to let the AI out of the box. If this happens, the gatekeeper has no motivation not to. However, if there is an external bet, then the gatekeeper always has motivation to not let the AI out, even if they think it would be best for the hypothetical world.
So, a game without stakes is actually most realistic, provided the gatekeeper is able to pretend they are actually in the scenario.
Doesn’t this suggest a serious discrepancy between the AI-box game and any possible future AI-box reality? After all, the stakes for the latter would be pretty damn high.
Yes. Although that’s something of a two-edged sword: in addition to real disincentives to release an AI that was not supposed to be, positive incentives would also be real.
Also it should be noted that I continue to be supportive of the idea of boxing/capacity controls of some kinds for autonomous AGI (they would work better with only modestly superintelligent systems, but seem cheap and potentially helpful for an even wider range), as does most everyone I have talked to about it at SI and FHI. The boxing game is fun, and provides a bit of evidence, but it doesn’t indicate that “boxing,” especially understood broadly, is useless.
Shut up and do the impossible (or is multiply?). In what version of the game and with what stakes would you expect to have a reasonable chance of success against someone like Brin or Zuckenberg (i.e. a very clever, very wealthy and not an overly risk-averse fellow)? What would it take to convince a person like that to give it a try? What is the expected payout vs other ways to fundraise?
What is the expected payout vs other ways to fundraise?
I’m not sure any profit below 500k$/year would be even worth considering, in light of the high risk of long-term emotional damage (and decrease in productivity, on top of not doing research while doing this stuff) to a high-value (F)AI researcher.
500k is a conservative figure assuming E.Y. is much more easily replaceable than I currently estimate him to be, because of my average success rate (confidence) in similar predictions.
If my prediction on this is actually accurate, then it would be more along the lines of one or two years of total delay (in creating an FAI), which is probably an order of magnitude or so in increased risk of catastrophic failure (a UFAI gets unleashed, for example) and in itself constitutes an unacceptable opportunity cost in lives not-saved. All this multiplied by whatever your probability that FAI teams will succeed and bring about a singularity, of course.
Past this point, it doesn’t seem like my mental hardware is remotely safe enough to correctly evaluate the expected costs and payoffs.
I mostly think the vast majority of possible successful strategies involve lots of dark arts and massive mental effort, and the backlash from failure to be proportional to the effort in question.
I find it extremely unlikely that Eliezer is sufficiently smart to win a non-fractional percent of the time using only safe and fuzzy non-dark-arts methods, and using a lot of bad nasty unethical mind tricks to get people to do what you want repeatedly like I figure would be required here is something that human brains have an uncanny ability to turn into a compulsive self-denying habit.
Basically, the whole exercise would most probably, if my estimates are right, severely compromise the mental heuristics and ability to reason correctly about AI of the participant—or, at least, drag it pretty much in the opposite direction to the one the SIAI seems to be pushing for.
This seems like a quick way to make money for CFAR/SI. After all, there are plenty of rich people around who would consider your proposal a guaranteed win for them, regardless of the stakes: “You mean I can say “I win” at any point and win the challenge? What’s the catch?”
I’m guessing Eliezer would lose most of his advantages against a demographic like that.
Yeah, they’d both lack background knowledge to RP the conversation and would also, I presume, be much less willing to lose the money than if they’d ventured the bet themselves. Higher-stakes games are hard enough already (I was 1 for 3 on those when I called a halt). And if it did work against that demographic with unsolicited requests (which would surprise me) then there would be, cough, certain ethical issues.
I was the 1 success out of 3, preceding the two losses. I went into it with an intention of being indifferent to the stakes, driven by interest in seeing the methods. I think you couldn’t win against anyone with a meaningful outside-of-game motive to win (for money or for status), and you got overconfident after playing with me, leading you to accept the other >$10 challenges and lose.
So I would bet against you winning any random high-stakes (including people who go in eager to report that they won for internet cred, but not people who had put the money in escrow or the equivalent) game, and expect a non-decent success rate for this:
So you are basically saying that you didn’t take the game seriously.
Even if your actual stakes were low, you should have played the role of a gatekeeper assigned to the task of guarding a potentially dangerous AI. Therefore, you player character should have had very high stakes.
No, high in-game stakes does not mean high out of game stakes.
In game, the gatekeeper could be convinced that it would be worth it to let the AI out of the box. If this happens, the gatekeeper has no motivation not to. However, if there is an external bet, then the gatekeeper always has motivation to not let the AI out, even if they think it would be best for the hypothetical world.
So, a game without stakes is actually most realistic, provided the gatekeeper is able to pretend they are actually in the scenario.
Well, in-game, the gatekeeper has no reason to believe anything the AI could promise or threaten.
Doesn’t this suggest a serious discrepancy between the AI-box game and any possible future AI-box reality? After all, the stakes for the latter would be pretty damn high.
Yes. Although that’s something of a two-edged sword: in addition to real disincentives to release an AI that was not supposed to be, positive incentives would also be real.
Also it should be noted that I continue to be supportive of the idea of boxing/capacity controls of some kinds for autonomous AGI (they would work better with only modestly superintelligent systems, but seem cheap and potentially helpful for an even wider range), as does most everyone I have talked to about it at SI and FHI. The boxing game is fun, and provides a bit of evidence, but it doesn’t indicate that “boxing,” especially understood broadly, is useless.
Shut up and do the impossible (or is multiply?). In what version of the game and with what stakes would you expect to have a reasonable chance of success against someone like Brin or Zuckenberg (i.e. a very clever, very wealthy and not an overly risk-averse fellow)? What would it take to convince a person like that to give it a try? What is the expected payout vs other ways to fundraise?
I’m not sure any profit below 500k$/year would be even worth considering, in light of the high risk of long-term emotional damage (and decrease in productivity, on top of not doing research while doing this stuff) to a high-value (F)AI researcher.
500k is a conservative figure assuming E.Y. is much more easily replaceable than I currently estimate him to be, because of my average success rate (confidence) in similar predictions.
If my prediction on this is actually accurate, then it would be more along the lines of one or two years of total delay (in creating an FAI), which is probably an order of magnitude or so in increased risk of catastrophic failure (a UFAI gets unleashed, for example) and in itself constitutes an unacceptable opportunity cost in lives not-saved. All this multiplied by whatever your probability that FAI teams will succeed and bring about a singularity, of course.
Past this point, it doesn’t seem like my mental hardware is remotely safe enough to correctly evaluate the expected costs and payoffs.
Are you worried he’d be hacked back? Or just discover he’s not as smart as he thinks he is?
I mostly think the vast majority of possible successful strategies involve lots of dark arts and massive mental effort, and the backlash from failure to be proportional to the effort in question.
I find it extremely unlikely that Eliezer is sufficiently smart to win a non-fractional percent of the time using only safe and fuzzy non-dark-arts methods, and using a lot of bad nasty unethical mind tricks to get people to do what you want repeatedly like I figure would be required here is something that human brains have an uncanny ability to turn into a compulsive self-denying habit.
Basically, the whole exercise would most probably, if my estimates are right, severely compromise the mental heuristics and ability to reason correctly about AI of the participant—or, at least, drag it pretty much in the opposite direction to the one the SIAI seems to be pushing for.
Really? Even if the money goes to existential risk prevention?