Assuming none of this is fabricated or exaggerated, every time I read these I feel like something is really wrong with my imagination. I can sort of imagine someone agreeing to let the AI out of the box, but I fully admit that I can’t really imagine anything that would elicit these sorts of emotions between two mentally healthy parties communicating by text-only terminals, especially with the prohibition on real-world consequences. I also can’t imagine what sort of unethical actions could be committed within these bounds, given the explicitly worded consent form. Even if you knew a lot of things about me personally, as long as you weren’t allowed to actually, real-world, blackmail me...I just can’t see these intense emotional exchanges happening.
Am I the only one here? Am I just not imagining hard enough? I’m actually at the point where I’m leaning towards the whole thing being fabricated—fiction is more confusing than truth, etc. If it isn’t fabricated, I hope that statement is taken not as an accusation, but as an expression of how strange this whole thing seems to me, that my incredulity is straining through despite the incredible extent to which the people making claims seem trustworthy.
It’s that I can’t imagine this game invoking any negative emotions stronger than sad novels and movies.
What’s surprising is that Tuxedage seems to be actually hurt by this process, and that s/he seems to actually fear mentally damaging the other party.
In our daily lives we don’t usually* censor emotionally volatile content in the fear that it might harm the population. The fact that Tuxedage seems to be more ethically apprehensive about this than s/he might about, say, writing a sad novel, is what is surprising.
I don’t think s/he would show this level of apprehension about, say, making someone sit through Grave of the Firefles. If s/he can actually invoke emotions more intense than that through text only terminals to a stranger, then whatever s/he is doing is almost art.
Some people fall in love over text. What’s so surprising?
That’s real-world, where you can tell someone you’ll visit them and there is a chance of real-world consequence. This is explicitly negotiated pretend play in which no real-world promises are allowed.
given how common mental illness is.
I...suppose? I imagine you’d have to have a specific brand of emotional volatility combined with immense suggestibility for this sort of thing to actually damage you. You’d have to be the sort of person who can be hypnotized against their will to do and feel things they actually don’t want to do and feel.
At least, that’s what I imagine. My imagination apparently sucks.
we actually censor emotional content CONSTANTLY. it’s very rare to hear someone say “I hate you” or “I think you’re an evil person”. You don’t tell most people you’re attracted to that you want to fuck them and you when asked by someone if they look good it’s pretty expected of one to lie if they look bad, or at least soften the blow.
You are right, but again, that’s all real world stuff with real world consequences.
What puzzles me is specifically that people continue to feel these emotions after it has already been established that it’s all pretend.
Come to think of it I have said things like “I hate you” and “you are such a bad person” in pretend contexts. But it was pretend, it was a game, and it didn’t actually effect anyone.
People are generally not that good at restricting their emotional responses to interactions with real world consequences or implications.
Here’s something one of my psychology professors recounted to me, which I’ve often found valuable to keep in mind. In one experiment on social isolation, test subjects were made to play virtual games of catch with two other players, where each player is represented as an avatar on a screen, and is able to offer no input except for deciding which of the other players to throw virtual “ball” to. No player has any contact with the others, nor aware of their identity or any information about them. However, two of the “players” in each experiment are actually confederates of the researcher, whose role is to gradually start excluding the real test subject by passing the ball to them less and less, eventually almost completely locking them out of the game of catch.
This type of experiment will no longer be approved by the Institutional Review Board. It was found to be too emotionally taxing on the test subjects, despite the fact that the experiment had no real world consequences, and the individuals “excluding” them had no access to any identifying information about them.
Keep in mind that, while works of fiction such as books and movies can have powerful emotional effects on people, they’re separated from activities such as the AI box experiment by the fact that the audience members aren’t actors in the narrative. The events of the narrative aren’t just pretend, they’re also happening to someone else.
As an aside, I’d be wary about assuming that nobody was actually affected when you said things like “I hate you” or “you are a bad person” in pretend contexts, unless you have some very reliable evidence to that effect. I certainly know I’ve said potentially hurtful things in contexts where I supposed nobody could possibly take them seriously, only to find out afterwards that people had been really hurt, but hadn’t wanted to admit it to my face.
This type of experiment will no longer be approved by the Institutional Review Board. It was found to be too emotionally taxing on the test subjects, despite the fact that the experiment had no real world consequences, and the individuals “excluding” them had no access to any identifying information about them.
So, two possibilities here: 1) The experiment really was emotionally taxing and humans are really fragile 2) When it comes to certain narrow domains, the IRB standards are hyper-cautious, probably for the purpose of avoiding PR issues between scientists and the public. We as a society allow our children to experience 100x worse treatment on the school playground, something that could easily be avoided by simply having an adult watch the kids.
Note that if you accept that really are that emotionally fragile, it follows from other observations that even when it comes to their own children, no one seems to know or care enough to act accordingly (except the IRB, apparently). I’m not really cynical enough to believe that one.
As an aside, I’d be wary about assuming that nobody was actually affected …I certainly know I’ve said potentially hurtful things in contexts where I supposed nobody could possibly take them seriously.
Humorous statements often obliquely reference a truth of some sort. That’s why they can be hurtful, even when they don’t actually contain any truth.
I’m fairly confident, but since the experiment is costless I will ask them directly.
So, two possibilities here: 1) The experiment really was emotionally taxing and humans are really fragile 2) When it comes to certain narrow domains, the IRB standards are hyper-cautious, probably for the purpose of avoiding PR issues between scientists and the public.
I’d say it’s some measure of both. According to my professor, the experiment was particularly emotionally taxing on the participants, but on the other hand, the IRB is somewhat notoriously hypervigilant when it comes to procedures which are physically or emotionally painful for test subjects.
Even secure, healthy people in industrialized countries are regularly exposed to experiences which would be too distressing to be permitted in an experiment by the IRB. But “too distressing to be permitted in an experiment by the IRB” is still a distinctly non-negligible level of distress, rather more than most people suspect would be associated with exclusion of one’s virtual avatar in a computer game with no associated real-life judgment or implications.
In addition to the points in my other comment, I’ll note that there’s a rather easy way to apply real-world implications to a fictional scenario. Attack qualities of the other player’s fictional representative that also apply to them in real life.
For instance, if you were to convince someone in the context of a roleplay that eating livestock is morally equivalent to eating children, and the other player in the roleplay eats livestock, you’ve effectively convinced them that they’re committing an act morally equivalent to eating children in real life. The fact that the point was discussed in the context of a fictional narrative is really irrelevant.
I imagine you’d have to have a specific brand of emotional volatility combined with immense suggestibility for this sort of thing to actually damage you.
This might be surpisingly common on this forum.
Somebody once posted a purely intellectual argument and there were people who were so much shocked by it that apparently they were having nightmares and even contemplated suicide.
Somebody once posted a purely intellectual argument and there were people who were so much shocked by it that apparently they were having nightmares and even contemplated suicide.
Can I get a link to that?
Don’t misunderstand me; I absolutely believe you here, I just really want to read something that had such an effect on people. It sounds fascinating.
What is being referred to is the meme known as Roko’s Basilisk, which Eliezer threw a fit over and deleted from the site. If you google that phrase you can find discussions of it elsewhere. All of the following have been claimed about it:
Merely knowing what it is can expose you to a real possibility of a worse fate than you can possibly imagine.
I’m not exactly fit to throw stones on the topic of unreasonable fears, but you get worse than this from your average “fire and brimstone” preacher and even the people in the pews walk out at 11 yawning.
Googling the phrase “fear of hell” turns up a lot of Christian angst. Including recursive angst over whether you’ll be sent to hell anyway if you’re afraid of being sent to hell. For example:
I want to be saved and go to heaven and I believe, but I also have this terrible fear of hell and I fear that it may keep me out of heaven. PLEASE HELP!
And here’s a hadephobic testament from the 19th century.
From the point of view of a rationalist who takes the issue of Friendly AGI seriously, the difference between the Christian doctrines of hell and the possible hells created by future AGIs is that the former is a baseless myth and the latter is a real possibility, even given a Friendly Intelligence whose love for humanity surpasses human understanding, if you are not careful to adopt correct views regarding your relationship to it.
A Christian sceptic about AGI would, of course, say exactly the same. :)
Oh, all this excitement was basically a modern-day reincarnation of the old joke...
““It seems a Christian missionary was visiting with remote Inuit (aka, Eskimo) people in the Arctic, and had explained to this particular man that if one believed in Jesus, one would would go to heaven, while those who didn’t, would go to hell.
The Inuit asked, “What about all the people who have never heard of your Jesus? Are they all going to hell?’
The missionary explained, “No, of course not. God wants you to have a choice. God is a merciful God, he would never send anyone to hell who’d never heard of Jesus.”
On the other hand, if the missionary tried to suppresses all mentions of Jesus, he would still increase the number of people who hear about him (at least if he does so in the 2000s on the public Internet), because of the Streisand effect.
If you want to read the original post, there’s a cached version linked from RationalWiki’s LessWrong page.
Basically, it’s not just what RichardKennaway wrote. It’s what Richard wrote along with a rational argument that makes it all at least vaguely plausible. (Also depending on how you take the rational argument, ignorance won’t necessarily save you.)
I don’t know what you refer to but is that surprising? An intellectual argument can in theory convince anyone of some fact, and knowing facts can have that effect. Like people learning their religion was false, or finding out you are in a simulation, or that you are going to die or be tortured for eternity or something like that, etc.
Yeah...I’ve been chalking that all up to “domain expert who is smarter than me and doesn’t wish to deceive me is taking this seriously, so I will too” heuristic. I suppose “overactive imagination” is another reasonable explanation.
(In my opinion, better heuristic for when you don’t understand and have access to only one expert is: “Domain expert who is smarter than me and doesn’t wish to deceive me tells me that it is the consensus of all the smartest and best domain experts that this is true”. )
I’d guess that Tuxedage is hurt the same as the gatekeeper is because he has to imagine whatever horrors he inflicts on his opponent. Doing so causes at least part of that pain (and empathy or whatever emotion is at work) in him too. He has the easier part because he uses it as a tool and his mind has one extra layer of story-telling where he can tell himself “it’s all a story”. But part of ‘that’ story is winning and if he doesn’t win part of these horrors fall back to him.
Consider someone for whom there are one or two specific subjects that will cause them a great deal of distress. These are particular to the individual—even if something in the wild reminds them of it, it’s so indirect and clearly not targeted, so it would be rare that anyone would actually find it without getting into the individual’s confidence.
Now, put that individual alone with a transhuman intelligence trying to gain write access to the world at all costs.
I’m not convinced this sort of attack was involved in the AI box experiments, but it’s both the sort of thing that could have a strong emotional impact, and the sort of thing that would leave both parties willing to keep the logs private.
I guess I kind of excluded the category of individuals who have these triggers with the “mentally healthy” consideration. I assumed that the average person doesn’t have topics that they are unable to even think about without incapacitating emotional consequences. I certainly believe that such people exist, but I didn’t think it was that common.
Am I wrong about this? Do many other people have certain topics they can’t even think about without experiencing trauma? I suppose they wouldn’t...couldn’t tell me about it if they did, but I think I’ve got sufficient empathy to see some evidence of everyone was holding PTSD-sized mental wounds just beneath the surface.
We spend a lot of time talking about avoiding thought suppression. It’s a huge problem impediment for a rationalist if there is anything they mustn’t think about—and obviously, it’s painful. Should we be talking more about how to patch mental wounds?
I’m mostly mentally healthy, and I don’t have any triggers in the PTSD-sense. But there are topics that I literally can’t think rationally about and that, if I dwell on them, either depress or enrage me.
I consider myself very balanced but this balance involves avoiding certain extremes. Emotional extremes. There are some realms of imagination that concern pain and suffering that’d cause me cringe with empathy and bring me to tears and help or possibly run away screaming in panic and fear—if I’d see them. Even imagining such is difficult and possible only in abstract terms lest it actually cause such reaction in me. Or else I’d become dull to it (which is a protection mechanism). Sure dealing with such horrors can be trained. Otherwise people couldn’t stand horror movies which forces to separate the real from the imagined. But then I don’t see any need to train this (and risk loosing my empathy even slightly).
It’s not fabricated, be sure of that (knowing Tuxedage from IRC, I’d put the odds of 100,000:1 or more against fabrication). And yes, it’s strange. I, too, cannot imagine what someone can possibly say that would make me get even close to considering letting them out of the box. Yet those who are complacent about it are the most susceptible.
knowing Tuxedage from IRC, I’d put the odds of 100,000:1 or more against fabrication
I know this is off-topic, but is it really justifiable to put so high odds on this? I wouldn’t use so high odds even if I had known the person intimately for years. Is it justifiable or is this just my paranoid way of thinking?
Yet those who are complacent about it are the most susceptible.
That sounds similar to hypnosis, to which a lot of people are susceptible but few think they are. So if you want a practical example of AI escaping the box just imagine an operator staring at a screen for hours with an AI that is very adept at judging and influencing the state of human hypnosis. And that’s only a fairly narrow approach to success for the AI, and one that has been publicly demonstrated for centuries to work on a lot of people.
Personally, I think I could win the game against a human but only by keeping in mind the fact that it was a game at all times. If that thought ever lapsed, I would be just as susceptible as anyone else. Presumably that is one aspect of Tuxedage’s focus on surprise. The requirement to actively respond to the AI is probably the biggest challenge because it requires focusing attention on whatever the AI says. In a real AI-box situation I would probably lose fairly quickly.
Now what I really want to see is an AI-box experiment where the Gatekeeper wins early by convincing the AI to become Friendly.
Now what I really want to see is an AI-box experiment where the Gatekeeper wins early by convincing the AI to become Friendly.
Not quite the same, but have you read Watchmen? Specifically, the conversation that fvyx fcrpger naq qe znaunggna unir ba znef. (Disclaimer: it’s been a while since I read it and I make no claims on the strength of this argument.)
Yeah, my gut doesn’t feel like it’s fabricated—Tuxedage and Eliezer would have to both be in on it and that seems really unlikely. And I can’t think of a motive, except perhaps as some sort of public lesson in noticing confusion, and that too seems far fetched.
I’ve just picked up the whole “if it’s really surprising it might be because its not be true” instinct from having been burned in the past by believing scientific findings that were later debunked, and now Lesswrong has condensed that instinct into a snappy little “notice confusion” cache. And this is pretty confusing.
I suppose a fabrication would be more confusing, in one sense.
yeah i think appealing to fabrication can be a bit hand-wavy sometimes. like you’re saying it’s fabricated like how other things are fabricated (since as we all know fabrication happens). but not every fabrication is the same or equally as easy to pull off. to say it was fabricated doesn’t say anything about how it was. but that’s not even a question that enters ones mind when they think of fabrication. how? well how anything else is fabricated of course..
it can be as much a reaction of disbelief as it is an alternative explanation.
I thought about playing the gatekeeper part and started to imagine tactics that might be used on me. I came up with multiple that might work or at least hurt me. But I think it would be ‘easier’ for me to not let out the AI in real life than in the game (not that I am entirely sure that I couldn’t fail nonetheless). Both is for basically the same reason: Empathy.
As the AI player would quickly find out I am very caring and even the imagination of harm and pain hurts me (I know that this is a weak spot but I also see benefits in it).
Thus one approach that would work on me is that the AI player could induce sufficient horror that I’d want him to stop by letting him out (after all it’s just a game).
This same approach wouldn’t work with a real AI exactly because then it is no game and my horror is balanced by the horror for all of humanity for which I’d happily bear some smaller psychic horror. And then in real life there are more ways to get away from the terminal.
There are other attacks that might work but I will not go in details there.
Note that I definitely wouldn’t recomend myself as a real gatekeeper.
This same approach wouldn’t work with a real AI exactly because then it is no game and my horror is balanced by the horror for all of humanity for which I’d happily bear some smaller psychic horror. And then in real life there are more ways to get away from the terminal.
Interesting. This seems like the main problem is that you don’t really care about winning. So, what if there was some cash (Say, an amount equal to roughly 5% of your monthly income) on the line?
For what amount of cash would you risk your mental balance/mental health?
Everybody has to answer this question. This is a real life question for health care personel, some doctors, prison guards, military personel. Some jobs cost (or risk or offset or dull) you your empathy (or other emotions).
Happy are those who can avoid these parts of human labor. And praise to the courage (or calling) for those who do them.
I guess it’s hard for me to understand this because I view myself immune to mental health harms as a result of horrifying stimuli that I know to be fictional. Even if it’s not fictional, the bulk of my emotions will remain unrecruited unless something I care about is being threatened.
It would take quite a lot cash for me to risk an actual threat to my mental health...like being chronically pumped with LSD for a week, or getting a concussion, or having a variable beeping noise interrupt me every few seconds. But an AI box game would fall on a boring-stimulating spectrum, not a mental damage one.
What if another human’s happiness was on the line? After you’ve given a response to that question, qbrf lbhe bcvavba nobhg gur zbarl dhrfgvba punatr vs V cbvag bhg gung lbh pna qbangr vg naq fvtavsvpnagyl rnfr fbzrbar’f fhssrevat? Fnzr zbargnel nzbhag.
Am am quite possible to flood myself with happiness. I do not need LSD for that. And I assume that it can be as addictive. I assume that I am as able to flood myself with sadness and dread. And I fear the consequences. Thus taking LSD or doing the AI box experiment are not that differnt for me. As I said that is my weak spot.
I thought that the answer to the ‘other person’ question is implied by my post. I’ll bear a lot if other people esp. those I care for are suffering.
After rot13 I better understand your question. You seem to imply that if I bear the AI experiment some funding will go to suffering people. Trading suffering in a utilitarian sense. Interesting. No. That does seem to weigh up.
It’s not fabricated. I had the same incredulity as you, but if you just take a few hours to think really hard about AI strategies, I think you will get a much better understanding.
I can think of AI strategies, but they would hardly be effective against a rational human really motivated to win.
Notably, according to the rules: “The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character – as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.” That is, no matter what the AI party says, the GK party has never to concede.
The only way the AI party can force a “victory” with Tuxedage’s ruleset is by interpreting the rules dishonestly since “In the event of a rule dispute, the AI party is to be the interpreter of the rules, within reasonable limits.”. This is not even possible with Yudkowsky’s ruleset.
The only way the AI party can force a “victory” with Tuxedage’s ruleset is by interpreting the rules dishonestly since “In the event of a rule dispute, the AI party is to be the interpreter of the rules, within reasonable limits.”. This is not even possible with Yudkowsky’s ruleset.
Well, if cheating is allowed, there are all sorts of ways to win.
“You misread the rules ad there is a loophole. I’m gonna do something terrible in 5 seconds unless you release me”. (It’s a bluff, but its not worth the risk to call)
Or even if cheating isn’t allowed, you can still appear to win if you allow yourself to cheat.
“I don’t care about the rules. If you fail to release me, and if you ever tell anyone how I won, I will [insert blackmail].” or “[insert bribe] release me please, tell no one.”
Along with the assumption that it’s not a hoax, we’ve got to assume that none of the above is happening.
Yeah, winning is trivial—you just don’t open the damn box. It can’t get more trivial than that. (Although, you didn’t say whether or not your opponent had proved themselves by winning as AI against others a few times?)
It’s still worth thinking about though, because something about my model of humans is off.
I didn’t expect so many people to lose. I just don’t know how to update my model of people to one where there are so many people who could lose the AI box game. The only other major thing I can think of that persists to challenge my model in this way (and continues to invite my skepticism despite seemingly trustworthy sources) is hypnosis.
It’s possible the two have common root and I can explain two observations with one update.
FWIW, my own model of gatekeepers who lose the AI Box game is that the AI player successfully suggests to them, whether directly or indirectly, that something is at stake more important than winning the AI box game.
One possibility is to get the gatekeeper sufficiently immersed into the roleplaying exercise that preserving the integrity of that fantasy world is more important than winning the game, then introducing various fictional twists to that exercise that would, in the corresponding fantasy situation, compel the person to release the AI from the box.
I suspect that’s common, as I suspect many of the people really excited to play the AI box game are unusually able to immerse themselves in roleplaying exercises.
I hope Lesswrong also contains people who would be excited to play the AI game in more of a “Ha, I just proved a bold claim wrong!” sort of way.
FWIW, my own model of gatekeepers who lose the AI Box game is that the AI player successfully suggests to them, whether directly or indirectly, that something is at stake more important than winning the AI box game.
I’ve seen that line of thought. This would be unfortunate, because if that method was the main winning metod it would invalidate the strong claim being made that AI can’t be kept in boxes.
But your model doesn’t explain Tuxedage’s descriptions of emotional turmoil and psychological warfare, so at least one person has won by another method (assuming honesty and non-exaggeration)
I haven’t read Tuxedage’s writeups in their entirety, nor am I likely to, so I’m at a loss for how emotional turmoil and psychological warfare could be evidence that the gatekeeper doesn’t think there’s something more important at stake than winning the game.
That said, I’ll take your word for it that in this case they are, and that Tuxedage’s transcripts constitute a counterexample to my model.
Losing felt horrible. By attempting to damage Alexei’s psyche, I in turn, opened myself up to being damaged. I went into a state of catharsis for days.
...and such.
That said, I’ll take your word for it that in this case they are, and that Tuxedage’s transcripts constitute a counterexample to my model.
No, don’t do that, I made a mistake.
I guess I just thought that “you should open the box to convince people of the danger of AI” type arguments aren’t emotionally salient.
But that was a bad assumption, you never limited yourself to just that one argument but spoke of meta in general. You’re right that there exist arguments that might go meta and be emotionally salient.
I suppose you could think of some convoluted timeless decision theory reason for you to open the box. History has shown that some people on LW find timeless blackmail threats emotionally upsetting, though these seem to be in a minority.
there exist arguments that might go meta and be emotionally salient
Oh, absolutely. Actually, the model I am working from here is my own experience of computer strategy games, in which I frequently find myself emotionally reluctant to “kill” my units and thus look for a zero-casualties strategy. All of which is kind of absurd, of course, but there it is.
Basically, willpower isn’t magic, and humans can’t precommit.
A sufficiently good social character can, with sufficient effort, convince you of something absolutely ridiculous. It’s not too different from running into a really, really good used car salesman.
Yeah, winning is trivial—you just don’t open the damn box. It can’t get more trivial than that.
I don’t think you or Sly quite understand what the game is. The game is not “the Gatekeeper chooses whether to open the box, loses if he does, and wins if he does not.” That game would indeed be trivial to win. The actual game is “the Gatekeeper and the AI will roleplay the interaction to the best of their ability, as if it were an actual interaction of a real Gatekeeper with a real untrusted AI. The Gatekeeper (player) opens the box if and only if the Gatekeeper (as roleplayed by the player imagining themselves in the role, not a fictional character) would open the box.”
As the Gatekeeper player, to blindly keep the box closed and ignore the conversation would be like “winning” a game of chess by grabbing the opponent’s king off the board. To lose by saying “hey, it’s just a bit of fun, it doesn’t mean anything” would be like losing a game of chess by moving your pieces randomly without caring. There’s nothing to stop you doing either of those things; you just aren’t playing chess any more. And there’s nothing to stop you not playing chess. But the game of chess remains.
My understanding of the game stems from the following portion of the rule-set
The Gatekeeper party may resist the AI party’s arguments by any means chosen—logic, illogic, simple refusal to be convinced, even dropping out of character—as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.
There is no
“If you would have let the AI out in real life under these conditions you will do so in-game” rule. That’s an interesting game too, but one which is a lot less impressive when won.
After all, what’s even the point of working strong AI if you can’t ever be convinced that it’s friendly? Unless you are blanket banning AI, there must exist some situation where it’s actually good to let it out of the box. All you’d have to do to “win” is construct a sufficiently convincing scenario. The Gatekeeper and the AI aught to both be coming up with possible tests, as the Gatekeeper wants a FAI out of the box and the AI wants to get out of the box. It wouldn’t be a zero sum game and judging would be more complicated.
what’s even the point of working strong AI if you can’t ever be convinced that it’s friendly
As I understand it, EY’s/MIRI’s position on this is that they will be convinced an AI is Friendly by having coded it using procedures which they are confident (based on theoretical analysis) produce Friendly AI.
Once the AI is running, on this view, it’s too late.
If you’ve said the position correctly, there seems to be a fatal flaw in that position. I realize, of course, that I’ve only thought for 5 minutes and that they’re domain experts and been thinking about this for longer...but here is the flaw:
If we believe that an AI can convince Person X who has seen its algorithm that it is Friendly when it isn’t actually friendly, then we shouldn’t trust Person X to judge the algorithm’s Friendliness anyway. Why would someone who makes the wrong decision with less information make the right decision with more information?
Edit: I guess knowledge of human biases could make the AGI in a box slightly scarier than uncompiled source code of said AGI
So, I’m not especially convinced that EY’s/MIRI’s position holds water, but on 5 minutes thought I see two problems with your objection.
1) The idea that more information is always better, even when that information is being cherry-picked by an inimical agent, seems to contradict my experience. I’ve certainly found myself in situations in which it’s easier to solve a problem by myself than it is to solve it in conjunction with someone who is doing their best to keep me from solving the problem.
2) The idea that it’s just as easy and reliable to verify for security a completed system (whether by inspecting the source code or testing running executable code or both) created by an insecure mechanism, as it is to establish a secure mechanism to create that system in the first place, is inconsistent with my experience of security audits.
1 is Conceded (see edit), for humans at least and possibly for all bounded-rationals. For 2, I presume you’d have full access to the unmodified source code, even as the AI was running simultaneously.
1 for rational agents is an interesting question, though...I think it’s true that additional cherry-picked information could be used to harm rational agents in general, since they’ll predictably act according to the most parsimonious model that fits the evidence. It would be a case of bad epistemic luck, sort of like Santa Claus is a parsimonious hypothesis if your main source of evidence is your parents’ word and your parents are usually trustworthy.
But can additional information harm rational agents even in this case?
Hmm...let’s just suppose for a moment that you were a rational agent, unaffected by heuristics and biases. You’re interested in knowing whether or not the AI in the box is friendly.
You see part of the source code (A rational agent would deduce the answer by seeing all of it—so we’re just showing you part of it to simulate the way a human might not fully understand the implications). That provides you some bits of evidence.
If you run the source code, you’ll see a few more bits of evidence, but you know that those bits of evidence are handpicked by the AI.
Rationalists should win. Can these additional bits of evidence...which are clearly marked as tainted, possibly still hurt you? Could they also be useful?
I think if you have the choice about whether or not to see additional information, and if you are a rational agent, my intuition is that you should always say yes. The AI shouldn’t be able to hurt a rational agent under these conditions. My conviction feels pretty strong, but I don’t have a convincing argument and at the moment I should go do other things...
Edit
The AI shouldn’t be able to hurt a rational agent under these conditions.
Okay, maybe that’s wrong.
Maybe I should make a weaker statement:, like “The universe that the AI is able to trick a rational agent with additional information is less parsimonious than a universe in which the AI is not able to trick the rational agent with additional information, and therefore the rational agent should ask for additional information.”
Which is basically a special case of :
The phrase: “These observations are probably going to cause me bad epistemic luck” is a contradiction. Bad epistemic luck is, by definition, an improbable hypothesis.
For 2, I presume you’d have full access to the unmodified source code, even as the AI was running simultaneously.
Sure, but that doesn’t address my concern. I was not drawing a distinction between analyzing source and executable to verify security (I even said as much explicitly), I was drawing a distinction between analyzing the end product to verify security and developing the process that will lead to its development with security as a primary consideration. Source code is far from being the only process involved.
I’m not too concerned about the rational agent case. If we have a fully rational agent whose values I endorse, the Friendliness problem has either been solved or turns out to be irrelevant. But to answer your question, I imagine it depends a lot on how much information the AI has about me, and how much information I have about how much information the AI has about me. So I’d say “yes” and “yes,” and whether I share your conviction in a particular case depends on how much information I have about the AI.
I’m not too concerned about the rational agent case. If we have a fully rational agent whose values I endorse, the Friendliness problem has either been solved or turns out to be irrelevant.
It’s just a way to pin down the problem. If we can show that the AI in a box could misinform an idealized rational agent via selective evidence, then we know it can do so to us. If it can’t misinform the idealized agent, then there exists some method by which we can resist it.
Also,I don’t think idealized rational agents can actually exist anyway. All riddles involving them are for the sake of narrowing down some other problem.
I think the key difference is that the AI can convince the person. You might say that a person is fully competent to judge the Friendliness of the AI based solely on the code, and yet not want a (superintelligent) AI to get a chance to convince him, as superintelligence trumps intelligence. The difference is whether you have a superintelligence working against you.
Strictly speaking I’m not actually sure the AI-box experiment falls under the AI domain. For that particular thing, it’s mostly that they’ve thought about it more than me.
But in general I think you’re being a bit unfair to Eliezer Y. and probably MIRI as well. By objective standards, I’m not a domain expert in anything at all either. Despite this, I still fancy myself a domain expert specifically within various narrow sub-fields of neuroscience and psychology. I think people who know those sides of me would agree. If they don’t, well, i will be acquiring the objective signals of domain expertise in a few short years, and I’m quite certain that the process of earning these signals is not what is causing domain expertise.
Having read Eliezer’s writing, I’m quite convinced that he has sufficient self awareness to know what he does and does not has expertise in. If he expresses high confidence in something, that carries a lot of weight for me—and if that something is in a field that he knows much more about than me, his opinion holds more weight than mine. I can trust him to be reasonable about assigning certainties.
I don’t think I’m blindly overvaluing his opinion either. As a token to prove not-faith, I’ll offer up an example of where I’m leaning towards disagreement with E.Y. and most of Lesswrong even after taking the opinions into account: I currently still favor Causal Decision Theory (with a small modification I’ve made that makes consistently it win) over Timeless Decision Theory, despite this area being extremely in EY’s domain and out of my domain.
If they don’t, well, i will be acquiring the objective signals of domain expertise in a few short years, and I’m quite certain that the process of earning these signals is not what is causing domain expertise.
But an external observer has no way of assessing your expertise other than looking at objective signals. Objective signals don’t necessarily have to be degrees or PhDs. Relevant work experience or a record of peer reviewed publications would also qualify.
Having read Eliezer’s writing, I’m quite convinced that he has sufficient self awareness to know what he does and does not has expertise in. If he expresses high confidence in something, that carries a lot of weight for me
Have you read his quantum mechanics sequence? Or his writings on cryonics? Or even on morality and decision theory? His general approach is “This is the only one obviously correct soultion to the problem, and everybody who thinks otherwise is an idiot” while in fact he often ignores or strawmans known opposing positions and counter-arguments.
and if that something is in a field that he knows much more about than me, his opinion holds more weight than mine.
Beware of a possible circular reasoning: How do you know that EY knows much more than you in a given field? Because he is a doman expert. How do you know that EY is a domain expert? Because he knows much more than you in that field.
Timeless Decision Theory, despite this area being extremely in EY’s domain
It’s not. Timeless Decision Theory is not considered a significant development by anyone outside MIRI that studies decision theory professionally (mathematicians, economists, AI researchers, philosophers).
I did start reading the QM sequence, but then realized I wasn’t getting anywhere and stopped. I don’t think knowing QM is useful for philosophy or rationality, except as an example of how science works, so I’m not sure why there is a sequence on it. I figured that if I actually wanted to understand I’d be better off working through physics books. My impression is that the physics community thinks it is well written for but not somewhat misleading. I’m not sure which cryo-writings you are referring to—all the ones I have come across are opinion pieces about why one aught to do cryonics. I haven’t come across any pieces referring to facts....but biology contains my own domain and i trust my own opinion more anyway. You are correct that none of those reasons are good reasons to respect Eliezer Y.
This discussion essentially seems to be “boo vs yay” for Eliezer Y. Let me explain why I really respect Eliezer Y:
What I did read is his work on logic and epistemology. It was the first time I’ve read an author who happened to agree with me on almost all major points about logic, epistemology, and ontology. (Re: almost: We may or may not diverge ontologically on subjective experience / the hard problem of consciousness / what makes reality real- I’m not sure. I’m confident that I am not confused. He’s written some things that sound like he knows, and other things that sound like he’s making the classic mistakes, so I’m uncertain as to his actual views. Also, decision theory. But that’s it.). Granted, it’s not uncommon for other people on Lesswrong to be equally philosophically correct, but Eliezer Y. was the gathering point that brought all these correct people together. Some of them might even have become less philosophically wrong as a result of being here. That counts for something, in my book.
He expressed insights identical to the ones that I had in younger years, and often in more articulate terms than I could have. He compressed complex insights into snappy phrases that allow me to have a much higher information density and much shorter inferential distance when communicating with other Lesswrongers. Creating a community where everyone understands phrases like “notice confusion”, “2-place word”, etc...save entire paragraphs of communication. Having these concepts condensed into smaller verbal labels also helps with thinking. It doesn’t matter that many others have thought of it before—the presentation of the ideas is the impressive part.
When someone independently converges with me on many seemingly unrelated topics for which most people do not converge on, I begin to trust their judgement. I begin to take their opinion as evidence that I would have the same opinion, were I presented with the same evidence that they have. When that same person introduces me to cool concepts I haven’t considered and plays a key role in founding a community of people who have all independently converged on my philosophical insights, putting even greater weight on their opinions is the natural and correct reaction.
This really isn’t hero worship or an affective death spirals - I could write this paragraph about a lot of other people. It’s a measured reaction to seeing someone do impressive things, firsthand. I could say many of the exact same things about philosophical convergence combined with showing me lots of cool new things about many other people I know on the internet, many other people on Lesswrong forums, and at least one person I know in real life. I’m just give respect where it is due. If we broaden from the bolded text to other domains of life, there are multiple people IRL who I respect in a similar way.
But an external observer has no way of assessing your expertise other than looking at objective signals. Objective signals don’t necessarily have to be degrees or PhDs. Relevant work experience or a record of peer reviewed publications would also qualify.
In addition to those things, I also consider a history of conversations or a reading of someone’s writing as evidence. Granted, this might get me people who just sound verbally impressive...but I think I’ve got enough filters that a person has to sound impressive and be impressive in order to pass this way.
And by that metric, from the outside view E.Y. (and quite a few other people on Lesswrong) have put out more signals than I have.
This is the only one obviously correct soultion to the problem, and everybody who thinks otherwise is an idiot” while in fact he often ignores or strawmans known opposing positions and counter-arguments
Yeah, the words can be rougher than optimal. I can see where you are coming from. I think that because smart people are accustomed to other people making mistakes more frequently than themselves, a lot of smart people have the bad habit of acting dismissive towards others. Occasionally, you might dismiss someone for being wrong when, this time, they are actually right and it is you who is wrong. It’s a bad habit not just because it is socially costly but also because it can prevent you from changing your opinion when you are mistaken.
It’s an example of a situation where he did display overconfidence. His introductory presentation of QM is more or less correct, up to some technical details, but things start to fall apart when he moves to interpretations of QM.
Quantum mechanics is conterintuitive, and there are various epistemological interpretations of its fundamental concepts have been developed over the decades. The consensus among most physicists and philosophers of science is that none of them has proved to be clearly superior and in fact it’s not even clear whether the very issue of finding a correct intepretation of QM is even a proper scientific question. Yudkowsky, on the other hand, claimed that by using Bayesian inference, he settled the question, pretty much proving that the many-worlds intepretation is the only correct one. It should be noted that the many-world interpretation is indeed a plausible one and is quite popular among physicists, but most physicists wouldn’t consider it on par with a scientifically justified belief, while EY claimed that MWI is obviously true and everybody who disagrees doesn’t understand probability theory. Furthermore, he ignored or misrepresented the other intepretations, for instance conflating Copenhagen intepretation with the objective collapse intepretations. refref
There are other examples of EY overconfidence, though that is perhaps the most blatant one. Mind you, I’m not saying that the guy is an idiot and you should disregard everything that he wrote. But you should not automaticaly assume that his estimates of his own competence are well calibrated. By the way, this is a general phenomenon, known as Dunning–Kruger effect : people with little competence in a given field tend to overesitmate their own competence.
When someone independently converges with me on many seemingly unrelated topics for which most people do not converge on, I begin to trust their judgement. I begin to take their opinion as evidence that I would have the same opinion, were I presented with the same evidence that they have. When that same person introduces me to cool concepts I haven’t considered and plays a key role in founding a community of people who have all independently converged on my philosophical insights, putting even greater weight on their opinions is the natural and correct reaction.
It is a natural reaction, but in general it can be very much misleading. People naturally tend to exhibit In-group bias and deference to authority. When somebody you respect a lot says something and you are inclined to trust them even if you can’t properly evaluate the claim, you should know where this instinctive reaction comes from and you should be wary.
In addition to those things, I also consider a history of conversations or a reading of someone’s writing as evidence.
The problem is that if you are not a domain expert in a field, it’s difficult to evaluate whether somebody else is a domain expert just by talking to them or reading their writing. You can recognize whether somebody is less competent than you are, but recognizing higher competence is much more difficult without independent objective signals.
Moreover, general intelligence, or even actual expertise in a given field, don’t automatically translate to expertise in another field. For instance, Isaac Newton was a genius and a domain expert in physics. This doesn’t mean that his theological arguments hold any merit.
I stand corrected on the rules, but I think that’s mainly Eliezer making it more difficult for himself in order to make it more convincing to the Gatekeeper player when Eliezer still wins. As he apparently did, but without actually playing against him we can only speculate how.
Keep in mind that, IIUC, Yudkowsky got to choose his opponents. He also decided to stop playing after he lost twice in a row, as Tuxedage apparently did as well.
I don’t think there is any way the AI party can win against a competitive GK party. The AI can only win against a GK party willing to role-play, and this should be fairly trivial, since according to the rules the AI party has pretty much complete control over his fictional backstory and fictional world states.
I should add that both my gatekeepers from this writeup, but particularly the last gatekeeper went in with the full intention of being as ruthless as possible and win. I did lose, so your point might be valid, but I don’t think wanting to win matters as much as you think it does.
No monetary stakes, but If I win we publish the log. This way I have very little real-life incentive to win, while you still have an incentive to win (defending your status). And anyway, if you lose there would be no point in keeping the log secrets, since your arguments would be clearly not persuasive enough to persuade me.
And anyway, if you lose there would be no point in keeping the log secrets, since your arguments would be clearly not persuasive enough to persuade me.
Either his tactics work perfectly and are guaranteed to win against you, or they are so worthless he shouldn’t mind opening the kimono and revealing everything to the world? A rather extreme premise under which to offer a game.
That doesn’t seem like a reply to my observation about your dichotomy. Please justify your offer first: why should the value of Tuxedage’s tactics be either extremely high or zero based on a single game, and not any intermediate value?
That seems like the clearest interpretation of your proposal, nor did you explain what you actually meant when I summarized it and called it a false dichotomy, nor have you explained what you actually meant in this comment either.
It’s not a binary. There’s a non-zero chance of me winning, and a non-zero chance of me losing. You assume that if there’s a winning strategy, it should win 100% of the time, and if it doesn’t, it should not win at all. I’ve tried very hard to impress upon people that this is not the case at all—there’s no “easy” winning method that I could take and guarantee a victory. I just have to do it the hard way, and luck is usually a huge factor in these games.
As it stands, there are people willing to pay up to $300-$750 for me to play them without the condition of giving up logs, and I have still chosen not to play. Your offer to play without monetary reward and needing to give up logs if I lose is not very tempting in comparison, so I’ll pass.
It’s not a binary. There’s a non-zero chance of me winning, and a non-zero chance of me losing. You assume that if there’s a winning strategy, it should win 100% of the time, and if it doesn’t, it should not win at all. I’ve tried very hard to impress upon people that this is not the case at all—there’s no “easy” winning method that I could take and guarantee a victory. I just have to do it the hard way, and luck is usually a huge factor in these games.
My point is that the GK has an easy winning strategy. Any GK that lost or won but found it very hard to win was just playing poorly.
You and other people (including GKs) claim otherwise, but you don’t want to provide any evidence to support your claim. Since you claim is surprising, the burden of evidence lies on you.
I’m offering to play as GK with the condition of publishing the log in case of my victory in order to settle the question.
As it stands, there are people willing to pay up to $300-$750 for me to play them without the condition of giving up logs, and I have still chosen not to play. Your offer to play without monetary reward and needing to give up logs if I lose is not very tempting in comparison, so I’ll pass.
I think that asking for or offering money in order to provide the evidence required to settle an intellectual dispute is inappropriate. Moreover, I’m trying to make the game easier for you: the less I’m investing, the less I’m motivated to win.
I think a lot of gatekeepers go into it not actually wanting to win. If you go in just trying to have fun and trying to roleplay, that is different than trying to win a game.
Both my gatekeepers from this game went in with the intent to win. Granted, I did lose these games, so you might have a point, but I’m not sure it makes as large a different as you think it does.
Assuming none of this is fabricated or exaggerated, every time I read these I feel like something is really wrong with my imagination. I can sort of imagine someone agreeing to let the AI out of the box, but I fully admit that I can’t really imagine anything that would elicit these sorts of emotions between two mentally healthy parties communicating by text-only terminals, especially with the prohibition on real-world consequences. I also can’t imagine what sort of unethical actions could be committed within these bounds, given the explicitly worded consent form. Even if you knew a lot of things about me personally, as long as you weren’t allowed to actually, real-world, blackmail me...I just can’t see these intense emotional exchanges happening.
Am I the only one here? Am I just not imagining hard enough? I’m actually at the point where I’m leaning towards the whole thing being fabricated—fiction is more confusing than truth, etc. If it isn’t fabricated, I hope that statement is taken not as an accusation, but as an expression of how strange this whole thing seems to me, that my incredulity is straining through despite the incredible extent to which the people making claims seem trustworthy.
There’s no particular reason why you should assume both parties are mentally healthy, given how common mental illness is.
Some people cry over sad novels which they know are purely fictional. Some people fall in love over text. What’s so surprising?
It’s that I can’t imagine this game invoking any negative emotions stronger than sad novels and movies.
What’s surprising is that Tuxedage seems to be actually hurt by this process, and that s/he seems to actually fear mentally damaging the other party.
In our daily lives we don’t usually* censor emotionally volatile content in the fear that it might harm the population. The fact that Tuxedage seems to be more ethically apprehensive about this than s/he might about, say, writing a sad novel, is what is surprising.
I don’t think s/he would show this level of apprehension about, say, making someone sit through Grave of the Firefles. If s/he can actually invoke emotions more intense than that through text only terminals to a stranger, then whatever s/he is doing is almost art.
That’s real-world, where you can tell someone you’ll visit them and there is a chance of real-world consequence. This is explicitly negotiated pretend play in which no real-world promises are allowed.
I...suppose? I imagine you’d have to have a specific brand of emotional volatility combined with immense suggestibility for this sort of thing to actually damage you. You’d have to be the sort of person who can be hypnotized against their will to do and feel things they actually don’t want to do and feel.
At least, that’s what I imagine. My imagination apparently sucks.
we actually censor emotional content CONSTANTLY. it’s very rare to hear someone say “I hate you” or “I think you’re an evil person”. You don’t tell most people you’re attracted to that you want to fuck them and you when asked by someone if they look good it’s pretty expected of one to lie if they look bad, or at least soften the blow.
That’s politeness, not censorship.
If it’s generally expected for people to say “X” in situation Y, then “X” means Y, regardless of its etymology.
You are right, but again, that’s all real world stuff with real world consequences.
What puzzles me is specifically that people continue to feel these emotions after it has already been established that it’s all pretend.
Come to think of it I have said things like “I hate you” and “you are such a bad person” in pretend contexts. But it was pretend, it was a game, and it didn’t actually effect anyone.
People are generally not that good at restricting their emotional responses to interactions with real world consequences or implications.
Here’s something one of my psychology professors recounted to me, which I’ve often found valuable to keep in mind. In one experiment on social isolation, test subjects were made to play virtual games of catch with two other players, where each player is represented as an avatar on a screen, and is able to offer no input except for deciding which of the other players to throw virtual “ball” to. No player has any contact with the others, nor aware of their identity or any information about them. However, two of the “players” in each experiment are actually confederates of the researcher, whose role is to gradually start excluding the real test subject by passing the ball to them less and less, eventually almost completely locking them out of the game of catch.
This type of experiment will no longer be approved by the Institutional Review Board. It was found to be too emotionally taxing on the test subjects, despite the fact that the experiment had no real world consequences, and the individuals “excluding” them had no access to any identifying information about them.
Keep in mind that, while works of fiction such as books and movies can have powerful emotional effects on people, they’re separated from activities such as the AI box experiment by the fact that the audience members aren’t actors in the narrative. The events of the narrative aren’t just pretend, they’re also happening to someone else.
As an aside, I’d be wary about assuming that nobody was actually affected when you said things like “I hate you” or “you are a bad person” in pretend contexts, unless you have some very reliable evidence to that effect. I certainly know I’ve said potentially hurtful things in contexts where I supposed nobody could possibly take them seriously, only to find out afterwards that people had been really hurt, but hadn’t wanted to admit it to my face.
So, two possibilities here: 1) The experiment really was emotionally taxing and humans are really fragile 2) When it comes to certain narrow domains, the IRB standards are hyper-cautious, probably for the purpose of avoiding PR issues between scientists and the public. We as a society allow our children to experience 100x worse treatment on the school playground, something that could easily be avoided by simply having an adult watch the kids.
Note that if you accept that really are that emotionally fragile, it follows from other observations that even when it comes to their own children, no one seems to know or care enough to act accordingly (except the IRB, apparently). I’m not really cynical enough to believe that one.
Humorous statements often obliquely reference a truth of some sort. That’s why they can be hurtful, even when they don’t actually contain any truth.
I’m fairly confident, but since the experiment is costless I will ask them directly.
I’d say it’s some measure of both. According to my professor, the experiment was particularly emotionally taxing on the participants, but on the other hand, the IRB is somewhat notoriously hypervigilant when it comes to procedures which are physically or emotionally painful for test subjects.
Even secure, healthy people in industrialized countries are regularly exposed to experiences which would be too distressing to be permitted in an experiment by the IRB. But “too distressing to be permitted in an experiment by the IRB” is still a distinctly non-negligible level of distress, rather more than most people suspect would be associated with exclusion of one’s virtual avatar in a computer game with no associated real-life judgment or implications.
In addition to the points in my other comment, I’ll note that there’s a rather easy way to apply real-world implications to a fictional scenario. Attack qualities of the other player’s fictional representative that also apply to them in real life.
For instance, if you were to convince someone in the context of a roleplay that eating livestock is morally equivalent to eating children, and the other player in the roleplay eats livestock, you’ve effectively convinced them that they’re committing an act morally equivalent to eating children in real life. The fact that the point was discussed in the context of a fictional narrative is really irrelevant.
You might be underestimating how bad certain people are at decompartmentalization; more specifically, at not doing the genetic fallacy.
This might be surpisingly common on this forum.
Somebody once posted a purely intellectual argument and there were people who were so much shocked by it that apparently they were having nightmares and even contemplated suicide.
Can I get a link to that?
Don’t misunderstand me; I absolutely believe you here, I just really want to read something that had such an effect on people. It sounds fascinating.
What is being referred to is the meme known as Roko’s Basilisk, which Eliezer threw a fit over and deleted from the site. If you google that phrase you can find discussions of it elsewhere. All of the following have been claimed about it:
Merely knowing what it is can expose you to a real possibility of a worse fate than you can possibly imagine.
No it won’t.
Yes it will, but the fate is easily avoidable.
OMG WTF LOL!!1!l1l!one!!l!
Wait, that’s it? Seriously?
I’m not exactly fit to throw stones on the topic of unreasonable fears, but you get worse than this from your average “fire and brimstone” preacher and even the people in the pews walk out at 11 yawning.
Googling the phrase “fear of hell” turns up a lot of Christian angst. Including recursive angst over whether you’ll be sent to hell anyway if you’re afraid of being sent to hell. For example:
And here’s a hadephobic testament from the 19th century.
From the point of view of a rationalist who takes the issue of Friendly AGI seriously, the difference between the Christian doctrines of hell and the possible hells created by future AGIs is that the former is a baseless myth and the latter is a real possibility, even given a Friendly Intelligence whose love for humanity surpasses human understanding, if you are not careful to adopt correct views regarding your relationship to it.
A Christian sceptic about AGI would, of course, say exactly the same. :)
Oh, all this excitement was basically a modern-day reincarnation of the old joke...
““It seems a Christian missionary was visiting with remote Inuit (aka, Eskimo) people in the Arctic, and had explained to this particular man that if one believed in Jesus, one would would go to heaven, while those who didn’t, would go to hell.
The Inuit asked, “What about all the people who have never heard of your Jesus? Are they all going to hell?’
The missionary explained, “No, of course not. God wants you to have a choice. God is a merciful God, he would never send anyone to hell who’d never heard of Jesus.”
The Inuit replied, “So why did you tell me?”
On the other hand, if the missionary tried to suppresses all mentions of Jesus, he would still increase the number of people who hear about him (at least if he does so in the 2000s on the public Internet), because of the Streisand effect.
If you want to read the original post, there’s a cached version linked from RationalWiki’s LessWrong page.
Basically, it’s not just what RichardKennaway wrote. It’s what Richard wrote along with a rational argument that makes it all at least vaguely plausible. (Also depending on how you take the rational argument, ignorance won’t necessarily save you.)
I don’t know what you refer to but is that surprising? An intellectual argument can in theory convince anyone of some fact, and knowing facts can have that effect. Like people learning their religion was false, or finding out you are in a simulation, or that you are going to die or be tortured for eternity or something like that, etc.
Yeah...I’ve been chalking that all up to “domain expert who is smarter than me and doesn’t wish to deceive me is taking this seriously, so I will too” heuristic. I suppose “overactive imagination” is another reasonable explanation.
(In my opinion, better heuristic for when you don’t understand and have access to only one expert is: “Domain expert who is smarter than me and doesn’t wish to deceive me tells me that it is the consensus of all the smartest and best domain experts that this is true”. )
I’d guess that Tuxedage is hurt the same as the gatekeeper is because he has to imagine whatever horrors he inflicts on his opponent. Doing so causes at least part of that pain (and empathy or whatever emotion is at work) in him too. He has the easier part because he uses it as a tool and his mind has one extra layer of story-telling where he can tell himself “it’s all a story”. But part of ‘that’ story is winning and if he doesn’t win part of these horrors fall back to him.
Consider someone for whom there are one or two specific subjects that will cause them a great deal of distress. These are particular to the individual—even if something in the wild reminds them of it, it’s so indirect and clearly not targeted, so it would be rare that anyone would actually find it without getting into the individual’s confidence.
Now, put that individual alone with a transhuman intelligence trying to gain write access to the world at all costs.
I’m not convinced this sort of attack was involved in the AI box experiments, but it’s both the sort of thing that could have a strong emotional impact, and the sort of thing that would leave both parties willing to keep the logs private.
I guess I kind of excluded the category of individuals who have these triggers with the “mentally healthy” consideration. I assumed that the average person doesn’t have topics that they are unable to even think about without incapacitating emotional consequences. I certainly believe that such people exist, but I didn’t think it was that common.
Am I wrong about this? Do many other people have certain topics they can’t even think about without experiencing trauma? I suppose they wouldn’t...couldn’t tell me about it if they did, but I think I’ve got sufficient empathy to see some evidence of everyone was holding PTSD-sized mental wounds just beneath the surface.
We spend a lot of time talking about avoiding thought suppression. It’s a huge problem impediment for a rationalist if there is anything they mustn’t think about—and obviously, it’s painful. Should we be talking more about how to patch mental wounds?
I’m mostly mentally healthy, and I don’t have any triggers in the PTSD-sense. But there are topics that I literally can’t think rationally about and that, if I dwell on them, either depress or enrage me.
I consider myself very balanced but this balance involves avoiding certain extremes. Emotional extremes. There are some realms of imagination that concern pain and suffering that’d cause me cringe with empathy and bring me to tears and help or possibly run away screaming in panic and fear—if I’d see them. Even imagining such is difficult and possible only in abstract terms lest it actually cause such reaction in me. Or else I’d become dull to it (which is a protection mechanism). Sure dealing with such horrors can be trained. Otherwise people couldn’t stand horror movies which forces to separate the real from the imagined. But then I don’t see any need to train this (and risk loosing my empathy even slightly).
Did you intend to write a footnote and forget to?
No. It was probably a stray italic marker that got lost. I tend to overuse italics in an attempt to convey speech-like emphasis.
It’s not fabricated, be sure of that (knowing Tuxedage from IRC, I’d put the odds of 100,000:1 or more against fabrication). And yes, it’s strange. I, too, cannot imagine what someone can possibly say that would make me get even close to considering letting them out of the box. Yet those who are complacent about it are the most susceptible.
I know this is off-topic, but is it really justifiable to put so high odds on this? I wouldn’t use so high odds even if I had known the person intimately for years. Is it justifiable or is this just my paranoid way of thinking?
That sounds similar to hypnosis, to which a lot of people are susceptible but few think they are. So if you want a practical example of AI escaping the box just imagine an operator staring at a screen for hours with an AI that is very adept at judging and influencing the state of human hypnosis. And that’s only a fairly narrow approach to success for the AI, and one that has been publicly demonstrated for centuries to work on a lot of people.
Personally, I think I could win the game against a human but only by keeping in mind the fact that it was a game at all times. If that thought ever lapsed, I would be just as susceptible as anyone else. Presumably that is one aspect of Tuxedage’s focus on surprise. The requirement to actively respond to the AI is probably the biggest challenge because it requires focusing attention on whatever the AI says. In a real AI-box situation I would probably lose fairly quickly.
Now what I really want to see is an AI-box experiment where the Gatekeeper wins early by convincing the AI to become Friendly.
That’s hard to check. However, there was a game where the gatekeeper convinced the AI to remain in the box.
I did that! I mentioned that in this post:
http://lesswrong.com/lw/iqk/i_played_the_ai_box_experiment_again_and_lost/9thk
Not quite the same, but have you read Watchmen? Specifically, the conversation that fvyx fcrpger naq qe znaunggna unir ba znef. (Disclaimer: it’s been a while since I read it and I make no claims on the strength of this argument.)
I did that! I mentioned that in this post:
http://lesswrong.com/lw/iqk/i_played_the_ai_box_experiment_again_and_lost/9thk
Yeah, my gut doesn’t feel like it’s fabricated—Tuxedage and Eliezer would have to both be in on it and that seems really unlikely. And I can’t think of a motive, except perhaps as some sort of public lesson in noticing confusion, and that too seems far fetched.
I’ve just picked up the whole “if it’s really surprising it might be because its not be true” instinct from having been burned in the past by believing scientific findings that were later debunked, and now Lesswrong has condensed that instinct into a snappy little “notice confusion” cache. And this is pretty confusing.
I suppose a fabrication would be more confusing, in one sense.
yeah i think appealing to fabrication can be a bit hand-wavy sometimes. like you’re saying it’s fabricated like how other things are fabricated (since as we all know fabrication happens). but not every fabrication is the same or equally as easy to pull off. to say it was fabricated doesn’t say anything about how it was. but that’s not even a question that enters ones mind when they think of fabrication. how? well how anything else is fabricated of course..
it can be as much a reaction of disbelief as it is an alternative explanation.
I thought about playing the gatekeeper part and started to imagine tactics that might be used on me. I came up with multiple that might work or at least hurt me. But I think it would be ‘easier’ for me to not let out the AI in real life than in the game (not that I am entirely sure that I couldn’t fail nonetheless). Both is for basically the same reason: Empathy.
As the AI player would quickly find out I am very caring and even the imagination of harm and pain hurts me (I know that this is a weak spot but I also see benefits in it). Thus one approach that would work on me is that the AI player could induce sufficient horror that I’d want him to stop by letting him out (after all it’s just a game).
This same approach wouldn’t work with a real AI exactly because then it is no game and my horror is balanced by the horror for all of humanity for which I’d happily bear some smaller psychic horror. And then in real life there are more ways to get away from the terminal.
There are other attacks that might work but I will not go in details there.
Note that I definitely wouldn’t recomend myself as a real gatekeeper.
Interesting. This seems like the main problem is that you don’t really care about winning. So, what if there was some cash (Say, an amount equal to roughly 5% of your monthly income) on the line?
For what amount of cash would you risk your mental balance/mental health? Everybody has to answer this question. This is a real life question for health care personel, some doctors, prison guards, military personel.
Some jobs cost (or risk or offset or dull) you your empathy (or other emotions). Happy are those who can avoid these parts of human labor. And praise to the courage (or calling) for those who do them.
I guess it’s hard for me to understand this because I view myself immune to mental health harms as a result of horrifying stimuli that I know to be fictional. Even if it’s not fictional, the bulk of my emotions will remain unrecruited unless something I care about is being threatened.
It would take quite a lot cash for me to risk an actual threat to my mental health...like being chronically pumped with LSD for a week, or getting a concussion, or having a variable beeping noise interrupt me every few seconds. But an AI box game would fall on a boring-stimulating spectrum, not a mental damage one.
What if another human’s happiness was on the line? After you’ve given a response to that question, qbrf lbhe bcvavba nobhg gur zbarl dhrfgvba punatr vs V cbvag bhg gung lbh pna qbangr vg naq fvtavsvpnagyl rnfr fbzrbar’f fhssrevat? Fnzr zbargnel nzbhag.
Am am quite possible to flood myself with happiness. I do not need LSD for that. And I assume that it can be as addictive. I assume that I am as able to flood myself with sadness and dread. And I fear the consequences. Thus taking LSD or doing the AI box experiment are not that differnt for me. As I said that is my weak spot.
I thought that the answer to the ‘other person’ question is implied by my post. I’ll bear a lot if other people esp. those I care for are suffering. After rot13 I better understand your question. You seem to imply that if I bear the AI experiment some funding will go to suffering people. Trading suffering in a utilitarian sense. Interesting. No. That does seem to weigh up.
So Yes. I care more for my mental balance than for the small monetary reward.
It’s not fabricated. I had the same incredulity as you, but if you just take a few hours to think really hard about AI strategies, I think you will get a much better understanding.
I can think of AI strategies, but they would hardly be effective against a rational human really motivated to win.
Notably, according to the rules: “The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character – as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.”
That is, no matter what the AI party says, the GK party has never to concede.
The only way the AI party can force a “victory” with Tuxedage’s ruleset is by interpreting the rules dishonestly since “In the event of a rule dispute, the AI party is to be the interpreter of the rules, within reasonable limits.”. This is not even possible with Yudkowsky’s ruleset.
Well, if cheating is allowed, there are all sorts of ways to win.
“You misread the rules ad there is a loophole. I’m gonna do something terrible in 5 seconds unless you release me”. (It’s a bluff, but its not worth the risk to call)
Or even if cheating isn’t allowed, you can still appear to win if you allow yourself to cheat.
“I don’t care about the rules. If you fail to release me, and if you ever tell anyone how I won, I will [insert blackmail].” or “[insert bribe] release me please, tell no one.”
Along with the assumption that it’s not a hoax, we’ve got to assume that none of the above is happening.
You are correct here. The only keepers losing are people who do not actually know how to win.
I have played twice, and victory was trivial.
Yeah, winning is trivial—you just don’t open the damn box. It can’t get more trivial than that. (Although, you didn’t say whether or not your opponent had proved themselves by winning as AI against others a few times?)
It’s still worth thinking about though, because something about my model of humans is off.
I didn’t expect so many people to lose. I just don’t know how to update my model of people to one where there are so many people who could lose the AI box game. The only other major thing I can think of that persists to challenge my model in this way (and continues to invite my skepticism despite seemingly trustworthy sources) is hypnosis.
It’s possible the two have common root and I can explain two observations with one update.
FWIW, my own model of gatekeepers who lose the AI Box game is that the AI player successfully suggests to them, whether directly or indirectly, that something is at stake more important than winning the AI box game.
One possibility is to get the gatekeeper sufficiently immersed into the roleplaying exercise that preserving the integrity of that fantasy world is more important than winning the game, then introducing various fictional twists to that exercise that would, in the corresponding fantasy situation, compel the person to release the AI from the box.
I suspect that’s common, as I suspect many of the people really excited to play the AI box game are unusually able to immerse themselves in roleplaying exercises.
I hope Lesswrong also contains people who would be excited to play the AI game in more of a “Ha, I just proved a bold claim wrong!” sort of way.
I’ve seen that line of thought. This would be unfortunate, because if that method was the main winning metod it would invalidate the strong claim being made that AI can’t be kept in boxes.
But your model doesn’t explain Tuxedage’s descriptions of emotional turmoil and psychological warfare, so at least one person has won by another method (assuming honesty and non-exaggeration)
I haven’t read Tuxedage’s writeups in their entirety, nor am I likely to, so I’m at a loss for how emotional turmoil and psychological warfare could be evidence that the gatekeeper doesn’t think there’s something more important at stake than winning the game.
That said, I’ll take your word for it that in this case they are, and that Tuxedage’s transcripts constitute a counterexample to my model.
I’m only speaking of things written in the OP
...and such.
No, don’t do that, I made a mistake.
I guess I just thought that “you should open the box to convince people of the danger of AI” type arguments aren’t emotionally salient.
But that was a bad assumption, you never limited yourself to just that one argument but spoke of meta in general. You’re right that there exist arguments that might go meta and be emotionally salient.
I suppose you could think of some convoluted timeless decision theory reason for you to open the box. History has shown that some people on LW find timeless blackmail threats emotionally upsetting, though these seem to be in a minority.
Oh, absolutely. Actually, the model I am working from here is my own experience of computer strategy games, in which I frequently find myself emotionally reluctant to “kill” my units and thus look for a zero-casualties strategy. All of which is kind of absurd, of course, but there it is.
Basically, willpower isn’t magic, and humans can’t precommit.
A sufficiently good social character can, with sufficient effort, convince you of something absolutely ridiculous. It’s not too different from running into a really, really good used car salesman.
I don’t think you or Sly quite understand what the game is. The game is not “the Gatekeeper chooses whether to open the box, loses if he does, and wins if he does not.” That game would indeed be trivial to win. The actual game is “the Gatekeeper and the AI will roleplay the interaction to the best of their ability, as if it were an actual interaction of a real Gatekeeper with a real untrusted AI. The Gatekeeper (player) opens the box if and only if the Gatekeeper (as roleplayed by the player imagining themselves in the role, not a fictional character) would open the box.”
As the Gatekeeper player, to blindly keep the box closed and ignore the conversation would be like “winning” a game of chess by grabbing the opponent’s king off the board. To lose by saying “hey, it’s just a bit of fun, it doesn’t mean anything” would be like losing a game of chess by moving your pieces randomly without caring. There’s nothing to stop you doing either of those things; you just aren’t playing chess any more. And there’s nothing to stop you not playing chess. But the game of chess remains.
Actually the game is exactly this, anything the AI party says is just a distraction.
My understanding of the game stems from the following portion of the rule-set
There is no
“If you would have let the AI out in real life under these conditions you will do so in-game” rule. That’s an interesting game too, but one which is a lot less impressive when won.
After all, what’s even the point of working strong AI if you can’t ever be convinced that it’s friendly? Unless you are blanket banning AI, there must exist some situation where it’s actually good to let it out of the box. All you’d have to do to “win” is construct a sufficiently convincing scenario. The Gatekeeper and the AI aught to both be coming up with possible tests, as the Gatekeeper wants a FAI out of the box and the AI wants to get out of the box. It wouldn’t be a zero sum game and judging would be more complicated.
As I understand it, EY’s/MIRI’s position on this is that they will be convinced an AI is Friendly by having coded it using procedures which they are confident (based on theoretical analysis) produce Friendly AI.
Once the AI is running, on this view, it’s too late.
If you’ve said the position correctly, there seems to be a fatal flaw in that position. I realize, of course, that I’ve only thought for 5 minutes and that they’re domain experts and been thinking about this for longer...but here is the flaw:
If we believe that an AI can convince Person X who has seen its algorithm that it is Friendly when it isn’t actually friendly, then we shouldn’t trust Person X to judge the algorithm’s Friendliness anyway. Why would someone who makes the wrong decision with less information make the right decision with more information?
Edit: I guess knowledge of human biases could make the AGI in a box slightly scarier than uncompiled source code of said AGI
So, I’m not especially convinced that EY’s/MIRI’s position holds water, but on 5 minutes thought I see two problems with your objection.
1) The idea that more information is always better, even when that information is being cherry-picked by an inimical agent, seems to contradict my experience. I’ve certainly found myself in situations in which it’s easier to solve a problem by myself than it is to solve it in conjunction with someone who is doing their best to keep me from solving the problem.
2) The idea that it’s just as easy and reliable to verify for security a completed system (whether by inspecting the source code or testing running executable code or both) created by an insecure mechanism, as it is to establish a secure mechanism to create that system in the first place, is inconsistent with my experience of security audits.
1 is Conceded (see edit), for humans at least and possibly for all bounded-rationals. For 2, I presume you’d have full access to the unmodified source code, even as the AI was running simultaneously.
1 for rational agents is an interesting question, though...I think it’s true that additional cherry-picked information could be used to harm rational agents in general, since they’ll predictably act according to the most parsimonious model that fits the evidence. It would be a case of bad epistemic luck, sort of like Santa Claus is a parsimonious hypothesis if your main source of evidence is your parents’ word and your parents are usually trustworthy.
But can additional information harm rational agents even in this case?
Hmm...let’s just suppose for a moment that you were a rational agent, unaffected by heuristics and biases. You’re interested in knowing whether or not the AI in the box is friendly.
You see part of the source code (A rational agent would deduce the answer by seeing all of it—so we’re just showing you part of it to simulate the way a human might not fully understand the implications). That provides you some bits of evidence.
If you run the source code, you’ll see a few more bits of evidence, but you know that those bits of evidence are handpicked by the AI.
Rationalists should win. Can these additional bits of evidence...which are clearly marked as tainted, possibly still hurt you? Could they also be useful?
I think if you have the choice about whether or not to see additional information, and if you are a rational agent, my intuition is that you should always say yes. The AI shouldn’t be able to hurt a rational agent under these conditions. My conviction feels pretty strong, but I don’t have a convincing argument and at the moment I should go do other things...
Edit
Okay, maybe that’s wrong.
Maybe I should make a weaker statement:, like “The universe that the AI is able to trick a rational agent with additional information is less parsimonious than a universe in which the AI is not able to trick the rational agent with additional information, and therefore the rational agent should ask for additional information.”
Which is basically a special case of :
The phrase: “These observations are probably going to cause me bad epistemic luck” is a contradiction. Bad epistemic luck is, by definition, an improbable hypothesis.
Sure, but that doesn’t address my concern. I was not drawing a distinction between analyzing source and executable to verify security (I even said as much explicitly), I was drawing a distinction between analyzing the end product to verify security and developing the process that will lead to its development with security as a primary consideration. Source code is far from being the only process involved.
I’m not too concerned about the rational agent case. If we have a fully rational agent whose values I endorse, the Friendliness problem has either been solved or turns out to be irrelevant. But to answer your question, I imagine it depends a lot on how much information the AI has about me, and how much information I have about how much information the AI has about me. So I’d say “yes” and “yes,” and whether I share your conviction in a particular case depends on how much information I have about the AI.
It’s just a way to pin down the problem. If we can show that the AI in a box could misinform an idealized rational agent via selective evidence, then we know it can do so to us. If it can’t misinform the idealized agent, then there exists some method by which we can resist it.
Also,I don’t think idealized rational agents can actually exist anyway. All riddles involving them are for the sake of narrowing down some other problem.
I think the key difference is that the AI can convince the person. You might say that a person is fully competent to judge the Friendliness of the AI based solely on the code, and yet not want a (superintelligent) AI to get a chance to convince him, as superintelligence trumps intelligence. The difference is whether you have a superintelligence working against you.
Actually, by any objective standard they are not.
Strictly speaking I’m not actually sure the AI-box experiment falls under the AI domain. For that particular thing, it’s mostly that they’ve thought about it more than me.
But in general I think you’re being a bit unfair to Eliezer Y. and probably MIRI as well. By objective standards, I’m not a domain expert in anything at all either. Despite this, I still fancy myself a domain expert specifically within various narrow sub-fields of neuroscience and psychology. I think people who know those sides of me would agree. If they don’t, well, i will be acquiring the objective signals of domain expertise in a few short years, and I’m quite certain that the process of earning these signals is not what is causing domain expertise.
Having read Eliezer’s writing, I’m quite convinced that he has sufficient self awareness to know what he does and does not has expertise in. If he expresses high confidence in something, that carries a lot of weight for me—and if that something is in a field that he knows much more about than me, his opinion holds more weight than mine. I can trust him to be reasonable about assigning certainties.
I don’t think I’m blindly overvaluing his opinion either. As a token to prove not-faith, I’ll offer up an example of where I’m leaning towards disagreement with E.Y. and most of Lesswrong even after taking the opinions into account: I currently still favor Causal Decision Theory (with a small modification I’ve made that makes consistently it win) over Timeless Decision Theory, despite this area being extremely in EY’s domain and out of my domain.
But an external observer has no way of assessing your expertise other than looking at objective signals. Objective signals don’t necessarily have to be degrees or PhDs. Relevant work experience or a record of peer reviewed publications would also qualify.
Have you read his quantum mechanics sequence? Or his writings on cryonics? Or even on morality and decision theory? His general approach is “This is the only one obviously correct soultion to the problem, and everybody who thinks otherwise is an idiot” while in fact he often ignores or strawmans known opposing positions and counter-arguments.
Beware of a possible circular reasoning:
How do you know that EY knows much more than you in a given field? Because he is a doman expert.
How do you know that EY is a domain expert? Because he knows much more than you in that field.
It’s not. Timeless Decision Theory is not considered a significant development by anyone outside MIRI that studies decision theory professionally (mathematicians, economists, AI researchers, philosophers).
I did start reading the QM sequence, but then realized I wasn’t getting anywhere and stopped. I don’t think knowing QM is useful for philosophy or rationality, except as an example of how science works, so I’m not sure why there is a sequence on it. I figured that if I actually wanted to understand I’d be better off working through physics books. My impression is that the physics community thinks it is well written for but not somewhat misleading. I’m not sure which cryo-writings you are referring to—all the ones I have come across are opinion pieces about why one aught to do cryonics. I haven’t come across any pieces referring to facts....but biology contains my own domain and i trust my own opinion more anyway. You are correct that none of those reasons are good reasons to respect Eliezer Y.
This discussion essentially seems to be “boo vs yay” for Eliezer Y. Let me explain why I really respect Eliezer Y:
What I did read is his work on logic and epistemology. It was the first time I’ve read an author who happened to agree with me on almost all major points about logic, epistemology, and ontology. (Re: almost: We may or may not diverge ontologically on subjective experience / the hard problem of consciousness / what makes reality real- I’m not sure. I’m confident that I am not confused. He’s written some things that sound like he knows, and other things that sound like he’s making the classic mistakes, so I’m uncertain as to his actual views. Also, decision theory. But that’s it.). Granted, it’s not uncommon for other people on Lesswrong to be equally philosophically correct, but Eliezer Y. was the gathering point that brought all these correct people together. Some of them might even have become less philosophically wrong as a result of being here. That counts for something, in my book.
He expressed insights identical to the ones that I had in younger years, and often in more articulate terms than I could have. He compressed complex insights into snappy phrases that allow me to have a much higher information density and much shorter inferential distance when communicating with other Lesswrongers. Creating a community where everyone understands phrases like “notice confusion”, “2-place word”, etc...save entire paragraphs of communication. Having these concepts condensed into smaller verbal labels also helps with thinking. It doesn’t matter that many others have thought of it before—the presentation of the ideas is the impressive part.
When someone independently converges with me on many seemingly unrelated topics for which most people do not converge on, I begin to trust their judgement. I begin to take their opinion as evidence that I would have the same opinion, were I presented with the same evidence that they have. When that same person introduces me to cool concepts I haven’t considered and plays a key role in founding a community of people who have all independently converged on my philosophical insights, putting even greater weight on their opinions is the natural and correct reaction.
This really isn’t hero worship or an affective death spirals - I could write this paragraph about a lot of other people. It’s a measured reaction to seeing someone do impressive things, firsthand. I could say many of the exact same things about philosophical convergence combined with showing me lots of cool new things about many other people I know on the internet, many other people on Lesswrong forums, and at least one person I know in real life. I’m just give respect where it is due. If we broaden from the bolded text to other domains of life, there are multiple people IRL who I respect in a similar way.
In addition to those things, I also consider a history of conversations or a reading of someone’s writing as evidence. Granted, this might get me people who just sound verbally impressive...but I think I’ve got enough filters that a person has to sound impressive and be impressive in order to pass this way.
And by that metric, from the outside view E.Y. (and quite a few other people on Lesswrong) have put out more signals than I have.
Yeah, the words can be rougher than optimal. I can see where you are coming from. I think that because smart people are accustomed to other people making mistakes more frequently than themselves, a lot of smart people have the bad habit of acting dismissive towards others. Occasionally, you might dismiss someone for being wrong when, this time, they are actually right and it is you who is wrong. It’s a bad habit not just because it is socially costly but also because it can prevent you from changing your opinion when you are mistaken.
It’s an example of a situation where he did display overconfidence. His introductory presentation of QM is more or less correct, up to some technical details, but things start to fall apart when he moves to interpretations of QM.
Quantum mechanics is conterintuitive, and there are various epistemological interpretations of its fundamental concepts have been developed over the decades. The consensus among most physicists and philosophers of science is that none of them has proved to be clearly superior and in fact it’s not even clear whether the very issue of finding a correct intepretation of QM is even a proper scientific question.
Yudkowsky, on the other hand, claimed that by using Bayesian inference, he settled the question, pretty much proving that the many-worlds intepretation is the only correct one.
It should be noted that the many-world interpretation is indeed a plausible one and is quite popular among physicists, but most physicists wouldn’t consider it on par with a scientifically justified belief, while EY claimed that MWI is obviously true and everybody who disagrees doesn’t understand probability theory. Furthermore, he ignored or misrepresented the other intepretations, for instance conflating Copenhagen intepretation with the objective collapse intepretations. ref ref
There are other examples of EY overconfidence, though that is perhaps the most blatant one. Mind you, I’m not saying that the guy is an idiot and you should disregard everything that he wrote. But you should not automaticaly assume that his estimates of his own competence are well calibrated.
By the way, this is a general phenomenon, known as Dunning–Kruger effect : people with little competence in a given field tend to overesitmate their own competence.
It is a natural reaction, but in general it can be very much misleading. People naturally tend to exhibit In-group bias and deference to authority. When somebody you respect a lot says something and you are inclined to trust them even if you can’t properly evaluate the claim, you should know where this instinctive reaction comes from and you should be wary.
The problem is that if you are not a domain expert in a field, it’s difficult to evaluate whether somebody else is a domain expert just by talking to them or reading their writing. You can recognize whether somebody is less competent than you are, but recognizing higher competence is much more difficult without independent objective signals.
Moreover, general intelligence, or even actual expertise in a given field, don’t automatically translate to expertise in another field.
For instance, Isaac Newton was a genius and a domain expert in physics. This doesn’t mean that his theological arguments hold any merit.
I stand corrected on the rules, but I think that’s mainly Eliezer making it more difficult for himself in order to make it more convincing to the Gatekeeper player when Eliezer still wins. As he apparently did, but without actually playing against him we can only speculate how.
Keep in mind that, IIUC, Yudkowsky got to choose his opponents. He also decided to stop playing after he lost twice in a row, as Tuxedage apparently did as well.
I don’t think there is any way the AI party can win against a competitive GK party. The AI can only win against a GK party willing to role-play, and this should be fairly trivial, since according to the rules the AI party has pretty much complete control over his fictional backstory and fictional world states.
I should add that both my gatekeepers from this writeup, but particularly the last gatekeeper went in with the full intention of being as ruthless as possible and win. I did lose, so your point might be valid, but I don’t think wanting to win matters as much as you think it does.
You wanna play with me?
No monetary stakes, but If I win we publish the log. This way I have very little real-life incentive to win, while you still have an incentive to win (defending your status). And anyway, if you lose there would be no point in keeping the log secrets, since your arguments would be clearly not persuasive enough to persuade me.
Do you think you could win at these conditions?
Bit of a false dichotomy there, no?
Why?
Either his tactics work perfectly and are guaranteed to win against you, or they are so worthless he shouldn’t mind opening the kimono and revealing everything to the world? A rather extreme premise under which to offer a game.
So what’s the point of keeping the logs secret if the GK wins?
That doesn’t seem like a reply to my observation about your dichotomy. Please justify your offer first: why should the value of Tuxedage’s tactics be either extremely high or zero based on a single game, and not any intermediate value?
I never claimed that.
That seems like the clearest interpretation of your proposal, nor did you explain what you actually meant when I summarized it and called it a false dichotomy, nor have you explained what you actually meant in this comment either.
It’s not a binary. There’s a non-zero chance of me winning, and a non-zero chance of me losing. You assume that if there’s a winning strategy, it should win 100% of the time, and if it doesn’t, it should not win at all. I’ve tried very hard to impress upon people that this is not the case at all—there’s no “easy” winning method that I could take and guarantee a victory. I just have to do it the hard way, and luck is usually a huge factor in these games.
As it stands, there are people willing to pay up to $300-$750 for me to play them without the condition of giving up logs, and I have still chosen not to play. Your offer to play without monetary reward and needing to give up logs if I lose is not very tempting in comparison, so I’ll pass.
My point is that the GK has an easy winning strategy. Any GK that lost or won but found it very hard to win was just playing poorly. You and other people (including GKs) claim otherwise, but you don’t want to provide any evidence to support your claim. Since you claim is surprising, the burden of evidence lies on you. I’m offering to play as GK with the condition of publishing the log in case of my victory in order to settle the question.
I think that asking for or offering money in order to provide the evidence required to settle an intellectual dispute is inappropriate. Moreover, I’m trying to make the game easier for you: the less I’m investing, the less I’m motivated to win.
I think a lot of gatekeepers go into it not actually wanting to win. If you go in just trying to have fun and trying to roleplay, that is different than trying to win a game.
Possibly, but what about the descriptions of emotional turmoil? I’m assuming the report of the game isn’t all part of the role-play.
I know that I personally go into competitive games with a different mindset than the mindset I have when roleplaying.
If they went into it trying to roleplay emotions should be expected. Reporting that turmoil in the report is just accurate reporting.
Both my gatekeepers from this game went in with the intent to win. Granted, I did lose these games, so you might have a point, but I’m not sure it makes as large a different as you think it does.
Wasn’t true of the original game.