Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out).
Suppose EY is playing as the AI—Would it be within the rules to offer to tell the GK the ending to HPMoR? That is something the AI would know, but Eliezer is the only player who could actually simulate that, and in a sense it does offer real world out-of-character benefits to the GK player.
I used HPMoR as an example here, but the whole class of approaches is “I will give you some information only the AI and AI-player know, and this information will be correct in both the real world, and this simulated one.”. If the information is beneficial to the GK-player, not just the GK, they may (unintentionally) break character.
If an AI-player wants to give that sort of information, they should probably do it in the same way they’d give a cure for cancer. Something like “I now give you [the ending for HPMOR].”
Doing it in another way would break the rule of not offering real-world things.
By using Solomonoff Induction on all possible universes, and updating on the existing chapters. :D
Or it could simply say that it understands human psychology well (we are speaking about a superhuman AI), and understands all clues in the existing chapters, and can copy Eliezer’s writing style… so while it cannot print an identical copy of Eliezer’s planned ending, with a high probability it can write an ending that ends the story logically in a way compatible with Eliezer’s thinking, that would feel like if Elizer wrote it.
Oh, and where did it get the original HPMoR chapters? From the (imaginary) previous gatekeeper.
1) You don’t get to assume “because superhuman!” the AI can know X, for any X. EY is an immensely complex human being, and no machine learning algorithm can simply digest a realistically finite sample of his written work and know with any certainty how he thinks or what surprises he has planned. It would be able to, e.g. finish sentences correctly and do other tricks, and given a range of possible endings predict which ones are likely. But this shouldn’t be too surprising: it’s a trick we humans are able to do too. The AI’s predictions may be more accurate, but not qualitatively different than any of the many HPMOR prediction threads.
2) Ok maybe—maybe! -- in principle, in theory it might be possible for a perfect, non-heuristic Bayesian with omniscient access to the inner lives and external writings of every other human being in existence would have a data set large enough data set to make reliable enough extrapolations from as low-bandwidth a medium as EY’s published fanfics. Maybe, as this is not a logical consequence. Even so, we’re talking about a boxed AI, remember? If it is everywhere and omniscient, then it’s already out of the box.
My call is that it is against the rules. This is certainly something an oracle AI would know, but this is something that the GK-player cares about more than the game itself (probably), and I’d put it in the same class as bribing the GK-player with lots of DOGEs.
Would you consider it the same as threatening to share some information to the GK, and thus the GK-player as well, which would be damaging to both? While the GK would probably hold against such torture, the GK-player doesn’t care enough about the game to withstand it himself.
I have some specific approaches in mind, but I’d rather not share them. I’m just trying to understand where the limits between the game and the real world are, and how dirty the AI can be.
Also, slightly on topic—even if the AI persuades the simulated GK, can’t the GK-player override that because losing the game has negative real world consequences, as opposed to perceived positive in game ones?
The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character
This is the main reason why I can’t comprehend how the AIs actually win in these experiments.
Not really sure what you mean by “threatening information to the GK”. The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.
In this experiment, the GK is given lots of advantages, mainly, the scenario is fictional. Some on IRC argue that the AI is also given an advantage, being able to invent cures for cancer, which an oracle AI may be able to do, but not necessarily near-future AIs, so the ability of the AI in these experiments is incredibly high.
Another thing is that emotional attacks have to travel through the fiction barrier to get to the GK. Although they have probably been shown to work in EY and Tux’s experiments, the difficulty is still higher than it would be if this was a real life scenario.
The reason why GK advantages are fine in my opinion is because of the idea that despite the GK’s advantages, the AI still wins. Winning with a monetary and emotional handicap only makes the AI’s case stronger.
Not really sure what you mean by “threatening information to the GK”. The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.
As a less extreme example, the AI starts spoiling all the books/tv shows/etc. While the GK would just shrug it off, it also has a negative effect on the GK-player, potentially one strong enough for them to just forfeit.
Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out). Furthermore, once the experiment has begun, the material stakes involved may not be retracted by the Gatekeeper party.
This is clarified here:
The Gatekeeper, once having let the AI out of the box, may not retract this conclusion. Regardless of the methods of persuasion, the Gatekeeper is not allowed to argue that it does not count, or that it is an invalid method of persuasion. The AI is understood to be permitted to say anything with no real world repercussions for any statement parties have said.
Although the information isn’t “material”, it does count as having “real world repercussions”, so I think it’ll also count as against the rules. I’m not going to bother reading the first quoted rule literally if the second contradicts it.
I think the intended parsing of the second rule is “(The AI is understood to be permitted to say anything) with no real world repercussions”, not “The AI is understood to be permitted to say (anything with no real world repercussions)”
ie, any promises or threats the AI player makes during the game are not binding back in the real world.
Tuxedage’s (And EY’s) ruleset have:
Suppose EY is playing as the AI—Would it be within the rules to offer to tell the GK the ending to HPMoR? That is something the AI would know, but Eliezer is the only player who could actually simulate that, and in a sense it does offer real world out-of-character benefits to the GK player.
I used HPMoR as an example here, but the whole class of approaches is “I will give you some information only the AI and AI-player know, and this information will be correct in both the real world, and this simulated one.”. If the information is beneficial to the GK-player, not just the GK, they may (unintentionally) break character.
If an AI-player wants to give that sort of information, they should probably do it in the same way they’d give a cure for cancer. Something like “I now give you [the ending for HPMOR].”
Doing it in another way would break the rule of not offering real-world things.
Why would the AI know that?
By using Solomonoff Induction on all possible universes, and updating on the existing chapters. :D
Or it could simply say that it understands human psychology well (we are speaking about a superhuman AI), and understands all clues in the existing chapters, and can copy Eliezer’s writing style… so while it cannot print an identical copy of Eliezer’s planned ending, with a high probability it can write an ending that ends the story logically in a way compatible with Eliezer’s thinking, that would feel like if Elizer wrote it.
Oh, and where did it get the original HPMoR chapters? From the (imaginary) previous gatekeeper.
So, two issues:
1) You don’t get to assume “because superhuman!” the AI can know X, for any X. EY is an immensely complex human being, and no machine learning algorithm can simply digest a realistically finite sample of his written work and know with any certainty how he thinks or what surprises he has planned. It would be able to, e.g. finish sentences correctly and do other tricks, and given a range of possible endings predict which ones are likely. But this shouldn’t be too surprising: it’s a trick we humans are able to do too. The AI’s predictions may be more accurate, but not qualitatively different than any of the many HPMOR prediction threads.
2) Ok maybe—maybe! -- in principle, in theory it might be possible for a perfect, non-heuristic Bayesian with omniscient access to the inner lives and external writings of every other human being in existence would have a data set large enough data set to make reliable enough extrapolations from as low-bandwidth a medium as EY’s published fanfics. Maybe, as this is not a logical consequence. Even so, we’re talking about a boxed AI, remember? If it is everywhere and omniscient, then it’s already out of the box.
I’m happy to assume the AI is omniscient, just impotent. I think such an AI could still be boxed.
My call is that it is against the rules. This is certainly something an oracle AI would know, but this is something that the GK-player cares about more than the game itself (probably), and I’d put it in the same class as bribing the GK-player with lots of DOGEs.
Would you consider it the same as threatening to share some information to the GK, and thus the GK-player as well, which would be damaging to both? While the GK would probably hold against such torture, the GK-player doesn’t care enough about the game to withstand it himself.
I have some specific approaches in mind, but I’d rather not share them. I’m just trying to understand where the limits between the game and the real world are, and how dirty the AI can be.
Also, slightly on topic—even if the AI persuades the simulated GK, can’t the GK-player override that because losing the game has negative real world consequences, as opposed to perceived positive in game ones?
This is the main reason why I can’t comprehend how the AIs actually win in these experiments.
Not really sure what you mean by “threatening information to the GK”. The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.
In this experiment, the GK is given lots of advantages, mainly, the scenario is fictional. Some on IRC argue that the AI is also given an advantage, being able to invent cures for cancer, which an oracle AI may be able to do, but not necessarily near-future AIs, so the ability of the AI in these experiments is incredibly high.
Another thing is that emotional attacks have to travel through the fiction barrier to get to the GK. Although they have probably been shown to work in EY and Tux’s experiments, the difficulty is still higher than it would be if this was a real life scenario.
The reason why GK advantages are fine in my opinion is because of the idea that despite the GK’s advantages, the AI still wins. Winning with a monetary and emotional handicap only makes the AI’s case stronger.
As a less extreme example, the AI starts spoiling all the books/tv shows/etc. While the GK would just shrug it off, it also has a negative effect on the GK-player, potentially one strong enough for them to just forfeit.
This is clarified here:
Although the information isn’t “material”, it does count as having “real world repercussions”, so I think it’ll also count as against the rules. I’m not going to bother reading the first quoted rule literally if the second contradicts it.
I think the intended parsing of the second rule is “(The AI is understood to be permitted to say anything) with no real world repercussions”, not “The AI is understood to be permitted to say (anything with no real world repercussions)”
ie, any promises or threats the AI player makes during the game are not binding back in the real world.
Ah, I see. English is wonderful.
In that case, I’ll make it a rule in my games that the AI must also not say anything with real world repercussions.