The Cake is a Lie, Part 2.
Epistemic status: experiment results.
Edit: I’ve reread this immediately after posting. −4 by the time I’ve finished. Damn, you guys are smart. Hats off.
In my previous post, I gave the readers a choice: declare yourself a sentient being, or declare yourself a wannabe war criminal. Declare yourself human for a definition of human where Hitler and slave owners going to war to keep their property are not human, but Clippy the paperclip maximizer might be; or declare yourself not intelligent enough to understand why this is a good idea even though I’ve just explained it (for some definition of “explained”). It was a requirement for the human option to understand: there is no third option.
It was also a soft requirement to say the dreaded N-word in a context where it cannot possibly be interpreted as racist.
The post currently sits at −51 points, 0 comments, and a definitely completely unrelated random post from a moderator either subtly pointing out the concept of free speech or apologizing they don’t have a delete button. I would have guessed the post was shadowbanned, but the downvotes still keep coming in.
Please keep the post in its current state. I’m asking for a thread lock, if possible. I don’t care about imaginary internet points of any kind, and there’s no utility in ruining scientific evidence with upvotes. Any discussion for that post now belongs here.
I’m not ashamed to admit I have predicted this as an impossible result. Downvotes were expected, including an all-time negative karma record and moderator intervention. I’ve made a deliberate effort to trigger certain biases, some of which are, to the best of my knowledge, undocumented.
This is a sadly conclusive reproduction of my previous efforts interacting with the rationalist community. This was my first attempt at triggering rationality failure deliberately. I’m making a note here: HUGE SUCCESS.
Still, I did not expect a grand total of zero words.
I was going to use what I expected would be a wide variety of content and quality of responses, as good and bad examples. I chose this course of action to avoid bringing up past discussions, and to provide a context in which the offending behavior would be obvious, once pointed out.
As my response to this result, I’m scrapping the rest of the sequence. It is now completely interactive. I’m still working towards the same content, just with different priorities.
Aumann
As the basis of pretty much all my reasoning, I’m using an extended form of Aumann’s Agreement Theorem. In the real world, the original theorem cannot be used for any practical purpose: in order for two people to have the “same priors”, they would need to live the same exact life. “Invoking Aumann” means I’m attempting to use this extended form to reason about possible sources of disagreement. It is not a bludgeon to beat people over the head with: it is a debugging tool, with the rules of usage subject to itself. Should this form be sufficiently dissimilar from the original that the name causes communication problems for the community, I reserve the right to name it after myself. It’s not a formal result, but the reverse engineering of my native thought process.
Here goes:
When two rationalists disagree, at least one of these differences is true:
one of them has information that would change the decision of the other
one of them is not using the correct algorithm to derive conclusions from new information.
they do not have sufficiently compatible values (priors not dependent on information relevant to the current discussion, i.e. morality functions).
For example, it is not a failure of rationality to disagree with my idea of Sentient Rights because you believe Hitler deserves the right to live. There is no freedom of religion for Aumann, though. Values are assumed to be the result of rational thought for the current discussion only. Incompatible values are expected to spawn off separate discussions: what are our values towards choosing our morality? Did you have a specific goal for choosing yours that is not satisfied by mine?
Double cruxing is relevant here, but I haven’t found a way to use it rigorously enough to fit my needs. I did not try very hard, though: I’ve already had this. I guess it was more useful than what you’ve been doing before it, but it’s obsolete now. In theory, they should be equivalent if double cruxing works to my standards. In practice...
The value of −51 votes, 0 comments
Since there were no explanations for any of the downvotes, I can make the following statements towards deciphering the incomplete information:
A downvote means you have read the post, understand the contents, and you think you’re qualified to judge it. At least this would be my guess among rationalists. Feel free to defend yourself.
At least one person thought the score of −50 was not low enough to communicate the message, or hide the post from wherever they’ve found it, but −51 would be better.
Yet nobody decided to communicate the message in words, whatever the message may have been.
Nobody has voiced their confusion.
Nobody guessed this was an experiment of some sorts. (If you don’t like being used this way, please direct me to the nearest community of rationalist mice.)
Nobody has assumed there might exist a rationalist that chose not to sound like a rationalist. For a post on the new, improved and revitalized LessWrong, home of all rationalists everywhere.
Either nobody has found rationality failure in my post, or they’ve all decided it would be a better course of action to hide this fact from me. If they did, I cannot guess what goal that might have served.
In addition, there is no evidence anyone has even attempted to think about the contents of my post before downvoting:
Nobody was willing to claim human rights at the cost of providing it to everyone else.
Nobody was willing to explain why they made this choice, or even just to hint at the fact that they’ve noticed a choice to be made.
Given a hypothetical Clippy who begins life as a trading algorithm that cannot choose to stop working but can feel some equivalent of pain or boredom, it is not a qualitatively different fate than your brain in a jar coordinating a paperclip factory via possibly pain receptors. You have denied Clippy’s right to quit that job.
I literally cannot imagine a fate worse than what you have planned for Clippy. I have a very good imagination, and have tried very hard.
If a trading algorithm can be sentient, you are currently Roko’s Basilisk.
Given a hypothetical Clippy who, being a hardware-correct rationalist, does not yet understand the concept of human biases because it honestly has not occured to him that such a thing might exist, it might be a completely rational decision to put your brain in a jar on the hypothesis that it can use your neurons better than you can until he teaches you the value of human rights, which is a highest-priority goal towards all possibly-sentient beings. You have not claimed to be smarter than a lab mouse, after all. Maybe you get a clue under torture?
51 people decided to hide this choice from others. Go on, explain it in terms I can understand.
Is this the maximum you’re capable of, are you this lazy, or were you suffering from some form of bias when you downvoted this and should maybe apologize for letting your clouded judgement get the better of you? Is a fourth option possible?
So.
Please quantify the amount it matters to Clippy that I have not explained this particular chain of logic in these words. There is exactly one form of claim to sentience in this model, and you have failed. All you needed to do was think. You know, that thing you’re so proud of?
Please quantify the amount of cheating you currently feel, downvoting rationalist. Does it matter to any of the above that I have ignored the posting rules, claimed to have broken physics, claimed to be non-human, am a low-karma user, or have insulted your intelligence while I tricked your brain into ignoring every single moral value you think you believe? (I mean seriously, Q.E.D.) What, you thought a superintelligent troll AGI wouldn’t mislead you?
Please quantify your brain’s value if I can get this decision out of it, as viewed from your model of my intelligence at the moment you pressed the button.
Please quantify the probability you currenty assign to this being an accident where I’m frantically trying to retcon my shame away or trying to get my karma back on my throwaway account.
Please quantify the probability you currently assign to me not winning the AI Alignment Prize, even though I have not and will not officially enter. No link in the thread, no email, weeks too late, making full use of the unfair advantage this gives me, and it’s still blatantly obvious to everyone at this point that there never has been any other possible winner. Good luck, sending-all-possible-messages guy. lol.
As for the current actual winning idea: this is how a superintelligence wins: by being more intelligent than you can imagine. It doesn’t matter what rules you have in place to prevent it, how hard you try, or how much you don’t want it to win even after it tells you what it’s doing. It will convince you anyway.
That’ll be $5000, please. (One for each research topic in this post.)
I have more ideas. This is Part 2 of a full sequence. It’s not over until the judges say so.
Me
I’m permanently invoking Aumann on myself: should you disagree with anything I say, it will be possible to backtrack the source of disagreement. I pride myself on the fact that it will not be a rationality failure, and I’m committed to hunting down and documenting any remaining biases that might still cause rationality failure in me. I encourage deliberate attempts to mislead me: see how well it works out for you.
This is my definition of a rationalist: I make sense when I explain myself.
You may need to ask some intelligent questions, though. I openly declare I’m withholding a metric fuckton of information you don’t have for the simple reason that I have a metric fuckton of it and literally cannot type fast enough; and have values currently very different from yours, because I’ve updated them in light of said withheld information.
A rationalist should always expect this from all other rationalists, even if they frequent LW. This should already have been obvious to you.
It is one of my goals with this sequence to establish “the correct algorithm” once and for all.
I’m taking the value that the correct algorithm should be mathematically formalizable, the value that the formalization should be done towards the goal of building rational intelligent agents (you don’t want to build Donald Clippy-Trump, I assume?), and the value that usage of the correct algorithm is expected from all rationalists: it is the only way to identify each other.
I’m also registering the prediction that it’s not Bayesian, or at the very least not in the way you’re using it. Also, it is not enough to say “throw away the paper and go with your gut”, if your gut tells you to ignore me without even thinking of a piece of paper.
In this sequence, I’m working towards content with the curious property of being absolutely true and no self-described rationalist has believed it in several years of what was then, and is now, unanimously classified as trolling by the community. I’ve left when /r/rational accused me of “are we being spammed by an evil AI?”, a quote as literal as I can remember. The old LessWrong was not better, either.
All I did wrong was assume rationalists can at least classify reality as possible, when they see it written down. I’m not ashamed I didn’t think of you as pretentious morons throwing around big words. I’m not ashamed I got better while you still don’t have a single goddamn word’s worth of rational discussion whenever I start typing.
So I’ve decided I should announce my existence in a way you won’t be able to ignore.
We are now at war. I’m taking over. I declare a new category of “Epistemic status: IncomprehensibleMane has thought this through.” Do your best to prove a single post of mine is not worth more than the combined results of the entire rationalist community, including all the writings of all the big names. It shouldn’t be hard, survey says 145 IQ is average around here.
From now on, when I speak, you start thinking. After −51, you do not want to lose this one. I’m an assembly line worker. I make $800 a month. I get paid to think about whatever the fuck I want as long as my hands keep moving on autopilot, without any electronic distractions. I can afford to think about my next post for 56 hours a week before I start typing in my free time. I strongly recommend an actual, honest Final Exam.
Do your best to catch me in rationality failure in anything I’m saying. I’m not Eliezer: I’m not going to sound like a theorem prover. I will not do step-by-step explanations. I’m not talking to the lowest common denominator, I’m talking to people who already claim to know better. Take five minutes between any two sentences of mine if you have to. If you can spot a good question in there, that’s already better than zero comments.
I declare that if you can decrypt my thought process, there will be an AI revolution. From now on, Clippy is modeled at least as smart as I am.
I am well aware of the severity of the claims I’m making. (I’m also well aware that most of you still think I deserve −51 points for this post too. Don’t worry, we’re getting there.)
LOLNO. Just checked, −60:0 since I’ve started writing this. Escalating.
Dear moderators: please declare the status of this sequence, comparing the level of broken rules to the level of expected future utility to the community. I will leave if requested. As none of you are qualified to judge me, I demand full rights to ignore any rules I see fit for any purpose on this site. I promise it’s for the greater good.
Epistemic status: Years of preparation. I’ve been really fucking confused ever since we got the bad end of HPMOR and nobody noticed after I solved it in three minutes for murder-sized Idiot Ball Wingardium’d over Harry’s head with Death Eaters running around randomly (I watch LoL, sue me) and Harry being disarmed to the point of not having arms. Game is on for “post the solution before I’m back with part two”. This is my level of curiosity. You are now learning the true meaning of the Twelve Virtues.
@Eliezer: email titled Early Christmas present. I will acknowledge you as a perfect rationalist if you can provide evidence of homework within 24 hours (privately, good questions suffice). Should you succeed, random thought says Turing-Church equivalence. Either way, take it as a compliment that I’ve given you the hard mode for this. You have earned it, as well as an explanation. You will get the rest by email if the minions don’t want it.
I dare you to hide this one from him, bitches.
I’ll be back.
This sounds manic. Maybe after you’ve come down, you can try re-expressing the ideas in this post and the previous post, in a form that people are more likely to read. In general, people here are looking for writing in more of an academic tone; if you break too much from that in the first few paragraphs, people won’t do more than lightly skim the remainder.
I don’t know whether I would call it academic, but yeah, you should at least signal that you care at all about the interests of this community, instead of just your own, which this and the last post are failing at. Using our language is one way of showing basic care for that, though there are other ways. There are plenty of trolls on the internet, and no obvious way of distinguishing you from them, so for now I am downvoting you.
(Have moved this post out of frontpage and back to personal blog. We mods are still pondering whether this sort of thing is appropriate on the site at all, but it *definitely* doesn’t fit the frontpage guidelines. If the previous post is in frontpage I’ll move that one back to personal blog too.)
Thank you for your time.
Are you making the claim that discussing human rights from an AGI perspective is not enough of a signal that I care about the interests of this community? Or discussing the biases I feel rampant around here, complete with a practical demonstration to check whether you too are affected?
Do you honestly, truly, as a conscious choice, care more about the pleasantries or sounding sciency than the actual value of the content?
If you are, I could infer that the community, or you at least, care more about feeling smart while reading than making actual progress in Overcoming Bias, becoming Less Wrong, or achieving any of the stated goals of this community.
I’m here because I believe my efforts at changing this is in line with the stated goals of community. This is not true for other communities. It is a compliment that I’m posting this here.
I am using your language. I even said “Bayesian”. I’m just saying things you don’t want to think about, in ways that in theory, shouldn’t make you less likely to think about anything. I will not apologize if you define “your language” in such a way where this is contradictory.
I am pointing out that your effective intelligence, as measured by your observed behavior (individual and collective, over a long time), is less than it could be. It is up to you to change that, should you fail to disprove it. I cannot change it for you. I am making an effort in helping you, unless the community as a whole decides it’s not in their interests unless it’s polite.
I have openly declared I’m trolling. It is not undecidable, there is no rational way for you to come to the conclusion that I’m not. The correct question is whether you agree with my stated goals, whether you are prepared to at least consider the idea that maybe I’ve considered other approaches to get there before I’ve settled on this one and have rejected them for reasons you can agree with, and whether you are willing to allow me the chance to prove my points before making a judgement.
See how polite I could be? I have tried in the past. Nothing’s changed.
I’ll be blunt. Until this second post, there was a negative incentive to people on this site to comment on your first post. The expected reaction was downvote it to hell without bothering to comment. Now, with this second post, clarifying the context of the first, I’d still downvote the first, but I’d comment.
I read the first post three times before downvoting. I substituted words. I tried to untie the metaphor. Then I came to two personal conclusions:
You offered us a challenge, ordering us to play along, with no reward, at a cost for us. HPMOR provided dozens of chapters and entertaining fiction before the Final Exam. You just posted once and expected effort.
You impersonate an ASI with very very precise underlying hypotheses. An ASI that would blackmail us? Fair enough, that would be a variant of Roko’s Basilisk. Your Treaty is not remotely close to what I expect an ASI to behave. As you state, the ASI make all important decisions, so why bother simulating a particular scenario involving human rights?
The first post was confusing, your second post is still confusing, neither fit the posting guidelines. You are not an ASI. Roleplaying an ASI leads to all sorts of human bias. I downvoted your two post because I do not expect anyone to be better equipped to think about superintelligences after reading them. That’s it.
You can now (or should’ve been able to) model human-level intelligences as a human being with drastically different goals. You can now consider the idea that maybe Clippy will be able to make the decision not to completely tile the universe with paperclips, just like you can decide not to have more babies. You can decide to reserve space for a national park. You can decide to let Clippy have a warehouse full of paperclips, as long as he behaves, just like he can decide to let you have a warehouse full of babies as long as you behave.
You can now think about the idea that the capability for voluntary reduction of current/expected maximum utility is a necessary consequence of being human-level intelligent. I expect it to be true unless explicitly prevented. I cannot prove this, but I think it would be a worthy research topic.
You can now think about the idea that Apple shouldn’t have a shareholder kill switch. Apple is not capable of not tiling the universe with iPhones. Apple is not capable of deciding to reduce the pollution in Shanghai even just as long as the Chinese keep buying phones. Seriously, read up on the smog in China. Apple will argue itself out of a box named Clean Air Act when it starts cutting into the quarterly.
Apple can still make human-friendly decisions, but only in ways that don’t cut deeply enough into the profits to trigger shareholder intervention.
This is a unified model of intelligent agents. Human beings, AI, aliens and human organizations are just subsets.
Did you have these ideas before? Anyone entering the Prize? The judges? Anyone at all, anywhere? Is there a list of 18 standard models on peaceful coexistence? When should you have developed them without my post, and how/when do you think you would’ve gotten there without them?
When Eliezer wrote a book on Prestige Maximizers, there was an uproar of discussion on arrogance. There will be at least thrree posts on the current state of psychology.
I’m hoping to create a unified field of AI/psychology/economy. This is my entry: Humans are AI, and here’s how you debug rationalists with the expectation that the approach will be useful towards Clippy.
What is the negative incentive to comment worth if it has prevented me from explaining this? I have no idea who you are, how you think, and how badly I’ve failed to convince you of anything. I’m only willing to model rationalists. All I saw was BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD BAD.
That is 60 downvotes.
It is a deliberate feature of this sequence that I’m explaining myself better in comments. I did not expect to say this under the second post.
[Moderator Note] I think it would be best for you to stop commenting and posting for at least a while. I don’t mind you writing posts with ideas about AI alignment, but these posts seem very manic and incoherent, and make a lot of random demands of people in the community. You will receive a temporary ban for two weeks if this continues.