there may be a way to constrain a superhuman AI such that it is useful but not dangerous...Can a superhuman AI be safely confined, and can humans managed to safely confine all superhuman AIs that are created?
Does anyone think that no AI of uncertain Friendliness could convince them to let it out of its box?
You’re looking to play AI in a box experiment? I’ve wanted to play gatekeeper for a while now. I don’t know if I’ll be able to offer money, but I would be willing to bet a fair amount of karma.
What I want to know is whether you are one of those who thinks no superintelligence could talk them out in two hours, or just no human. If not with a probability of literally zero (or perhaps one for the ability of a superintelligence to talk its way out), approximately what.
Regardless, let’s do this some time this month. As far as betting is concerned, something similar to the original seems reasonable to me.
I am >90% confident that no human could talk past me, and I doubt a superintelligence could either without some sort of “magic” like a Basilisk image (>60% it couldn’t).
Unfortunately, I can’t bet money. We could do predictionbook predictions, or bet karma.
It all depends on the relative stakes. Suppose you bet $10 that you wouldn’t let a human AI-impersonator out of a box. A clever captive could just transfer $10 of bitcoins to one address, $100 to a second, $1000 to a third, and so on. During the breakout attempt the captive would reveal the private keys of increasingly valuable addresses to indicate both the capability to provide bribes of ever greater value and the inclination to continue cooperating even if you didn’t release them after a few bribes. The captive’s freedom is almost always worth more to them than your bet is to you.
I’d be really interested to know what kind of arguments actually work for the AI. I find it hard to understand why anyway would believe they’d be an effective gatekeeper.
Could you maybe set it up so we get some transcripts or aftermath talk, maybe anonymous if necessary? (You seem to have enough volunteers to run multiple rounds and so there’d be plausible deniability.) If not, I’d like to volunteer as a judge (and would keep quiet afterwards), just so I can see it in action.
(I’d volunteer as an AI, but I don’t trust my rhetorical skills enough to actually convince someone.)
What I want to know is whether you are one of those who thinks no superintelligence could talk them out in two hours, or just no human. If not with a probability of literally zero (or perhaps one for the ability of a superintelligence to talk its way out), approximately what.
Regardless, let’s do this some time this month. As far as betting is concerned, something similar to the original seems reasonable to me.
I live in Germany, so timezone is GMT +1.
My preferred time would be on a workday sometime after 8 pm (my time).
Since I’m a german native speaker, and the AI has the harder job anyway, I offer:
50 dollars for you if you win, 10 dollars for me if I do.
I’d love to be a gatekeeper too, if you or anyone else is up for it. I’m similarly limited financially, but maybe a very small amount of money or a bunch of karma (not that I have all that much). I would be willing to bet 10 karma for an AI victory for every 1 karma for a gatekeeper victory (me being the gatekeeper) or even quite a bit higher if necessary.
What I want to know is whether you are one of those who thinks no superintelligence could talk them out in two hours, or just no human. If not with a probability of literally zero (or perhaps one for the ability of a superintelligence to talk its way out), approximately what.
Regardless, let’s do this some time this month. As far as betting is concerned, something similar to the original seems reasonable to me.
A superintelligence almost surely could, but I don’t think a human could, not if I really didn’t want to let them out. For a human, maybe .02? I can’t really quantify my feeling, but in words it’s something like “Yeah right. How could anyone possibly do this?”.
Does anyone think that no AI of uncertain Friendliness could convince them to let it out of its box?
I’m looking for a Gatekeeper.
Why doesn’t craigslist have a section for this in the personals? “AI seeking human for bondage roleplay.” Seems like it would be a popular category...
You’re looking to play AI in a box experiment? I’ve wanted to play gatekeeper for a while now. I don’t know if I’ll be able to offer money, but I would be willing to bet a fair amount of karma.
Maybe bet with predictionbook predictions?
You sir, are a man (?) after my own heart.
Sounds good. I don’t have a predictionbook account yet, but IIRC it’s free.
What I want to know is whether you are one of those who thinks no superintelligence could talk them out in two hours, or just no human. If not with a probability of literally zero (or perhaps one for the ability of a superintelligence to talk its way out), approximately what.
Regardless, let’s do this some time this month. As far as betting is concerned, something similar to the original seems reasonable to me.
I am >90% confident that no human could talk past me, and I doubt a superintelligence could either without some sort of “magic” like a Basilisk image (>60% it couldn’t).
Unfortunately, I can’t bet money. We could do predictionbook predictions, or bet karma.
Edited to add probabilities.
It all depends on the relative stakes. Suppose you bet $10 that you wouldn’t let a human AI-impersonator out of a box. A clever captive could just transfer $10 of bitcoins to one address, $100 to a second, $1000 to a third, and so on. During the breakout attempt the captive would reveal the private keys of increasingly valuable addresses to indicate both the capability to provide bribes of ever greater value and the inclination to continue cooperating even if you didn’t release them after a few bribes. The captive’s freedom is almost always worth more to them than your bet is to you.
I’d be really interested to know what kind of arguments actually work for the AI. I find it hard to understand why anyway would believe they’d be an effective gatekeeper.
Could you maybe set it up so we get some transcripts or aftermath talk, maybe anonymous if necessary? (You seem to have enough volunteers to run multiple rounds and so there’d be plausible deniability.) If not, I’d like to volunteer as a judge (and would keep quiet afterwards), just so I can see it in action.
(I’d volunteer as an AI, but I don’t trust my rhetorical skills enough to actually convince someone.)
I’d bet up to fifty dollars!?
What I want to know is whether you are one of those who thinks no superintelligence could talk them out in two hours, or just no human. If not with a probability of literally zero (or perhaps one for the ability of a superintelligence to talk its way out), approximately what.
Regardless, let’s do this some time this month. As far as betting is concerned, something similar to the original seems reasonable to me.
Do you still want to do this?
To be more specific:
I live in Germany, so timezone is GMT +1. My preferred time would be on a workday sometime after 8 pm (my time). Since I’m a german native speaker, and the AI has the harder job anyway, I offer: 50 dollars for you if you win, 10 dollars for me if I do.
Well, I’m somewhat sure (80%?) that no human could do it, but...let’s find out! Original terms are fine.
I’d love to be a gatekeeper too, if you or anyone else is up for it. I’m similarly limited financially, but maybe a very small amount of money or a bunch of karma (not that I have all that much). I would be willing to bet 10 karma for an AI victory for every 1 karma for a gatekeeper victory (me being the gatekeeper) or even quite a bit higher if necessary.
What I want to know is whether you are one of those who thinks no superintelligence could talk them out in two hours, or just no human. If not with a probability of literally zero (or perhaps one for the ability of a superintelligence to talk its way out), approximately what.
Regardless, let’s do this some time this month. As far as betting is concerned, something similar to the original seems reasonable to me.
A superintelligence almost surely could, but I don’t think a human could, not if I really didn’t want to let them out. For a human, maybe .02? I can’t really quantify my feeling, but in words it’s something like “Yeah right. How could anyone possibly do this?”.
I’d be interested in gatekeeping, as well, as long as it takes place late in the evening or on a weekend.