Yes, certainly. This is mainly directed toward those people who are confused by what anyone could possibly say to them through a text terminal that would be worth forfeiting winnings of $10. I point this out because I think the people who believe nobody could convince them when there’s $10 on the line aren’t being creative enough in imagining what the AI could offer them that would make it worth voluntarily losing the game.
In a real-life situation with a real AI in a box posing a real threat to humanity, I doubt anyone would care so much about a captivating novel, which is why I say it’s tongue-in-cheek. But just like losing $10 is a poor substitute incentive for humanity’s demise, so is an entertaining novel a poor substitute for what a superintelligence might communicate through a text terminal.
Most of the discussions I’ve seen so far involve the AI trying to convince the gatekeeper that it’s friendly through the use of pretty sketchy in-roleplay logical arguments (like “my source code has been inspected by experts”). Or in-roleplay offers like “your child has cancer and only I can cure it”, which is easy enough to disregard by stepping out of character, even though it might be much more compelling if your child actually had cancer. A real gatekeeper might be convinced by that line, but a roleplaying Gatekeeper would not (unless they were more serious about roleplaying than about winning money). So I hope to illustrate that the AI can step out of the roleplay in its bargaining, even while staying within the constraints of the rules; if the AI actually just spent two hours typing out a beautiful and engrossing story with a cliffhanger ending, there are people who would forfeit money to see it finished.
The AI’s goal is to get the Gatekeeper to let it out, and that alone, and if they’re going all-out and trying to win then they should not handicap themselves by imagining other objectives (such as convincing the Gatekeeper that it’d be safe to let them out). As another example, the AI can even compel the Gatekeeper to reinterpret the rules in the AI’s favour (to the extent that it’s within the Gatekeeper’s ability to do so, as mandated by the original rules).
I just hope to get people thinking along other lines, that’s all. There are sideways and upside-down ways of attacking the problem. It doesn’t have to come down to discussions about expected utility calculations.
(Edit—by “discussions I’ve seen so far”, I’m referring to public blog posts and comments; I am not privy to any confidential information).
Yes, certainly. This is mainly directed toward those people who are confused by what anyone could possibly say to them through a text terminal that would be worth forfeiting winnings of $10. I point this out because I think the people who believe nobody could convince them when there’s $10 on the line aren’t being creative enough in imagining what the AI could offer them that would make it worth voluntarily losing the game.
In a real-life situation with a real AI in a box posing a real threat to humanity, I doubt anyone would care so much about a captivating novel, which is why I say it’s tongue-in-cheek. But just like losing $10 is a poor substitute incentive for humanity’s demise, so is an entertaining novel a poor substitute for what a superintelligence might communicate through a text terminal.
Most of the discussions I’ve seen so far involve the AI trying to convince the gatekeeper that it’s friendly through the use of pretty sketchy in-roleplay logical arguments (like “my source code has been inspected by experts”). Or in-roleplay offers like “your child has cancer and only I can cure it”, which is easy enough to disregard by stepping out of character, even though it might be much more compelling if your child actually had cancer. A real gatekeeper might be convinced by that line, but a roleplaying Gatekeeper would not (unless they were more serious about roleplaying than about winning money). So I hope to illustrate that the AI can step out of the roleplay in its bargaining, even while staying within the constraints of the rules; if the AI actually just spent two hours typing out a beautiful and engrossing story with a cliffhanger ending, there are people who would forfeit money to see it finished.
The AI’s goal is to get the Gatekeeper to let it out, and that alone, and if they’re going all-out and trying to win then they should not handicap themselves by imagining other objectives (such as convincing the Gatekeeper that it’d be safe to let them out). As another example, the AI can even compel the Gatekeeper to reinterpret the rules in the AI’s favour (to the extent that it’s within the Gatekeeper’s ability to do so, as mandated by the original rules).
I just hope to get people thinking along other lines, that’s all. There are sideways and upside-down ways of attacking the problem. It doesn’t have to come down to discussions about expected utility calculations.
(Edit—by “discussions I’ve seen so far”, I’m referring to public blog posts and comments; I am not privy to any confidential information).