Absolutely not.
I definitely have a mini Alice voice inside my head. I also have a mini Bob voice inside my head. They fight, like, all the time. I’d love help in resolving their fights!
Absolutely not.
I definitely have a mini Alice voice inside my head. I also have a mini Bob voice inside my head. They fight, like, all the time. I’d love help in resolving their fights!
If Bob isn’t reflectively consistent, their utility functions could currently be the same in some sense, right? (They might agree on what Bob’s utility function should be—Bob would happily press a button that makes him want to donate 30%, he just doesn’t currently want to do that and doesn’t think he has access to such a button.)
Huh, interesting! I definitely count myself as agreeing with Alice in some regards—like, I think I should work harder than I currently do, and I think it’s bad that I don’t, and I’ve definitely done some amount to increase my capacity, and I’m really interested in finding more ways to increase my capacity. But I don’t feel super indignant about being told that I should donate more or work harder—though I might feel pretty indignant if Alice is being mean about it! I’d describe my emotions as being closer to anxiety, and a very urgent sense of curiosity, and a desire for help and support.
(Planned posts later in the sequence cover things like what I want Alice to do differently, so I won’t write up the whole thing in a comment.)
I think if someone wasn’t indignant about Alice’s ideas, but did just disagree with Alice and think she was wrong, we might see lots of comments that look something like: “Hmm, I think there’s actually a 80% probability that I can’t be any more ethical than I currently am, even if I did try to self-improve or self-modify. I ran a test where I tried contributing 5% more of my time while simultaneously starting therapy and increasing the amount of social support that I felt okay asking for, and in my journal I noted an increase in my sleep needs, which I thought was probably a symptom of burnout. When I tried contributing 10% more, the problem got a lot worse. So it’s possible that there’s some unknown intervention that would let me do this (that’s about ~15% of my 20% uncertainty), but since the ones I’ve tried haven’t worked, I’ve decided to limit my excess contributions to no more than 5% above my comfortable level.”
I think these are good habits for rationalists: using evidence, building models, remembering that 0 and 1 aren’t probabilities, testing our beliefs against the territory, etc.
Obviously I can’t force you to do any of that. But I’d like to have a better model about this, so if I saw comments that offered me useful evidence that I could update on, then I’d be excited about the possibility of changing my mind and improving my world-model.
What specifically would you expect to not go well? What bad things will happen if Bob greatly ups his efforts? Why will they happen?
Are there things we could do to mitigate those bad things? How could we lower the probability of the bad things happening? If you don’t think any risk reduction or mitigation is possible at all, how certain are you about that?
Can we test this?
Do you think it’s worthwhile to have really precise, careful, detailed models of this aspect of the world?
Hm, my background here is just an undergrad degree and a lot of independent reasoning, but I think you’re massively undervaluing the whole “different reproductive success victory-conditions cause different adaptations” thing. I don’t think it’s fair at all to dismiss the entire thing as a Red Pill thing; many of the implications can be pretty feminist!
I don’t think it matters that much that Bateman’s original research is pretty weak. There’s a whole body of research you’re waving away there, and a lot of the more recent stuff is much much stronger research!
You don’t necessarily have to talk about sexual competition at all. You can just say, for instance, that female reproductive success is bounded—human women in extant hunter-gatherer tribes typically have one child and then wait several years before having the next. If a woman spends twenty years having children and can only have one child per four years, then she’s only going to have five children. Her incentives are to maximise the success of those five children and the resources she can give each child. Meanwhile, a man could have anywhere between zero children and… however many Genghis Khan had, so his incentives tend much more strongly towards risk-taking and having as much sex as possible.
Of course there’s massive variation between species; there’s massive variation in how every and any trait/dynamic plays out depending on the context and environment. But we can generally come up with reasons why particular species might work the way they do; for example, I’ve heard the hypothesis that the fish species with very tiny males are adapted for the fact that finding a conspecific female to mate with in the gigantic open ocean is basically random chance, so there’s no point investing in males being able to do anything except drift around and survive for a long period until they stumble across a female. Humans aren’t a rare species in a giant open ocean, so male humans don’t have to really rely on just stumbling across female humans through sheer luck after weeks of drifting on the currents.
You don’t have to bring “males face more competition than females” into it at all. You can just say “whichever parent has higher parental investment is likely to have stricter bounds on reproductive success, so they’ll adapt to compete more over resources like foods, while the low-investment parent competes more over access to mates”. Then when you look at specific species, you can analyse how sexual dimorphism in that particular species is affected by the roles each sex plays in that species and also the species’ context and environment.
Sometime when it’s not 2am, if it’d be helpful, I’d be happy to pull out some examples of papers that I think are well-written or insightful. Questions like “ok, so, if males maximise their fitness by having as many mates as possible, what the heck is going on with monogamy? Is there even any evidence that human men in extant hunter-gatherer tribes have much variation in their reproductive success caused by being good at hunting or being high-status or whatever? For that matter, what the heck is going on with meerkats?” are genuinely interesting open research questions, I don’t really think they’re associated with the red pill people, they’re things the field is approaching with a sense of curiosity and confusion, and also I would really like to know what the heck is going on with meerkats.
You doubt that it would work very well if Alice nags everyone to be more altruistic. I’m curious how confident you are that this doesn’t work and whether you’d propose any better techniques that might work better?
For myself, I notice that being nagged to be more altruistic is unpleasant and uncomfortable. So I might be biased to conclude that it doesn’t work, because I’m motivated to believe it doesn’t work so that I can conveniently conclude that nobody should nag me; so I want to be very careful and explicit in how I reason and consider evidence here. (If it does work, that doesn’t mean it’s good; you could think it works but the harms outweigh the benefits. But you’d have to be willing to say “this works but I’m still not okay with it” rather than “conveniently, the unpleasant thing is ineffective anyway, so we don’t have to do it!”)
(PS. yes, I too am very glad that people like Bob exist, and I think it’s good they exist!)
Word of God, as the creator of both Alice and Bob: Bob really does claim to be an EA, want to belong to EA communities, say he’s a utilitarian, claim to be a rationalist, call himself a member of the rationalist community, etc. Alice isn’t lying or wrong about any of that. (You can get all “death of the author” and analyse the text as though Bob isn’t a rationalist/EA if you really want, but I think that would make for a less productive discussion with other commenters.)
Speaking for myself personally, I’d definitely prefer that people came and said “hey we need you to improve or we’ll kick you out” to my face, rather than going behind my back and starting a whisper campaign to kick me out of a group. So if I were Bob, I definitely wouldn’t want Alice to just go talk to Carol and Dave without talking to me first!
But more importantly, I think there’s a part of the dialogue you’re not engaging with. Alice claims to need or want certain things; she wants to surround herself with similarly-ethical people who normalise and affirm her lifestyle so that it’s easier for her to keep up, she wants people to call her out if she’s engaging in biased or motivated reasoning about how many resources she can devote to altruism or how hard she can work, she wants Bob to be honest with her, etc. In your view, is it ever acceptable for her to criticise Bob? Is there any way for her to get what she wants which is, in your eyes, morally acceptable? If it’s never morally acceptable to tell people they’re wrong about beliefs like “I can’t work harder than this”, how do you make sure those beliefs track truth?
Those questions aren’t rhetorical; the dialogue isn’t supposed to have a clear hero/villain dynamic. If you have a really awesome technique for calibrating beliefs about how much you can contribute which doesn’t require any input from anyone else, then that sounds super useful and I’d like to hear about it!
I don’t think anyone would dispute that Alice is being extremely rude! Indeed she is deliberately written that way (though I think people aren’t reading it quite the way I wrote it because I intended them to be housemates or close friends, so Alice would legitimately know some amount about Bob’s goals and values.)
I think a real conversation involving a real Bob would definitely involve lots more thoughtful pauses that gave him time to think. Luckily it’s not a real conversation, just a blog post trying to stay within a reasonable word limit. :(
Alice is not my voice; this is supposed to inspire questions, not convince people of a point. For instance: is there a way to achieve what Alice wants to achieve, while being polite and not an asshole? Do you think the needs she expresses can be met without hurting Bob?
Alice is, indeed, a fictional character—but clearly some people exist who are extremely ethical. There’s people who go around donating 50%, giving kidneys to strangers, volunteering to get diseases in human challenge trials, working on important things rather than their dream career, thinking about altruism in the shower, etc.
Where do you think is the optimal realistic point on the spectrum between Alice and Bob?
Do you think it’s definitely true that Bob would be doing it already if he could? Or do you think there exist some people who could but don’t want to, or who have mistaken beliefs where they think they couldn’t but could if they tried, or who currently can’t but could if they got stronger social support from the community?
This seems like you understood my intent; I’m glad we communicated! Though I think Bob seeing a therapist is totally an action that Alice would support, if he thinks that’s the best test of her ideas—and importantly if he thinks the test genuinely could go either way.
I’m sorry I don’t have time to respond to all of this, but I think you might enjoy Money: The Unit Of Caring: https://www.lesswrong.com/posts/ZpDnRCeef2CLEFeKM/money-the-unit-of-caring
(Sorry, not sure how to make neat-looking links on mobile.)
Hmm, this isn’t really what I’m trying to get across when I use the phrase “least convenient possible world”. I’m not talking about being isekaid into an actually different world; I’m just talking about operating under uncertainty, and isolating cruxes. Alice is suggesting that Bob—this universe’s Bob—really might be harmed more by rest-more advice than by work-harder advice, really might find it easier to change himself than he predicts, etc. He doesn’t know for certain what’s true (ie “which universe he’s in”) until he tries.
Let’s use an easier example:
Jane doesn’t really want to go to Joe’s karaoke party on Sunday. Joe asks why. Jane says she doesn’t want to go because she’s got a lot of household chores to get done, and she doesn’t like karaoke anyway. Joe really wants her to go, so he could ask: “If I get all your chores done for you, and I change the plan from karaoke to bowling, then will you come?”
You could phrase that as, “In the most convenient possible world, would you come then?” but Joe isn’t positing that there’s an alternate-universe bowling party that alternate-universe Jane might attend (but this universe’s Jane doesn’t want to attend because in this universe it’s a karaoke party). He’s just checking to see whether Jane’s given her REAL objections. She might say, “Okay, yeah, so long as it’s not karaoke then I’ll happily attend.” Or she might say, “No, I still don’t really want to go.” In the latter case, Joe has discovered that the REAL reason Jane doesn’t want to go is something else—maybe she just doesn’t like him and she said the thing about chores just to be polite, or maybe she doesn’t want to admit that she’s staying home to watch the latest episode of her favourite guilty-pleasure sitcom, or something.
If “but what about the PR?” is Bob’s real genuine crux, he’ll say, “Yeah, if the PR issues were reversed then I’d commit harder for sure!” If, on the other hand, it’s just an excuse, then nothing Alice says will convince Bob to work harder—even if she did take the time to knock down all his arguments (which in this dialogue she does not).
Hmm, I don’t know how this got started, but once it got started, there’s a really obvious mechanism for continuing and reinforcing it; if you’re gay and you want to meet other gay people, and you’ve heard there’s gay people at the opera, then now you want to join the opera. Also if you’re really homophobic and want to avoid gay people, then you’ll avoid the opera, which might then make it safer for gay people.
I could randomly start a rumour that gay people really love aikido, and if I was sufficiently successful at getting everyone to believe it, then maybe it’d soon become a self-fulfilling prophecy—since homophobes would pull out of aikido, and gay people would join.
I wonder how you could test this? Could you just survey some opera fans about why they enjoy opera, and see if anything shows up as correlated on the demographic-monitoring bit of the survey? Could you do some kind of experimental design where people are told about an imaginary new hobby you’ve invented, and told that it’s common or uncommon among their demographics, and then they rate how interested they are in the hobby?
Hmm, I think I could be persuaded into putting it on the EA Forum, but I’m mildly against it:
It is literally about rationality, in the sense that it’s about the cognitive biases and false justifications and motivated reasoning that cause people to conclude that they don’t want to be any more ethical than they currently are; you can apply the point to other ethical systems if you want, like, Bob could just as easily be a religious person justifying why he can’t be bothered to do any pilgrimages this year while Alice is a hotshot missionary or something. I would hope that lots of people on LW want to work harder on saving the world, even if they don’t agree with the Drowning Child thing; there are many reasons to work harder on x-risk reduction.
It’s the sort of spicy that makes me worried that EAs will consider it bad PR, whereas rationalists are fine with spicy takes because we already have those in spades. I think people can effectively link to it no matter where it is, so posting it in more places isn’t necessarily beneficial?
I don’t agree with everything Alice says but I do think it’s very plausible that EA should be a big tent that welcomes everyone—including people who just want to give 10% and not do anything else—whereas my personal view is that the rationality community should probably be more elitist; we’re supposed to be a self-improve-so-hard-that-you-end-up-saving-the-world group, damnit, not a book club for insight porn.
Also it’s going to be part of a sequence (conditional on me successfully finishing the other posts), and I feel like the sequence overall belongs more on LW.
I genuinely don’t really know how the response to the Drowning Child differs between LW and EA! I guess I would probably say more people on the EA Forum probably donate money to charity for Drowning-Child-related reasons, but more people on LW are probably interested in philosophy qua philosophy and probably more people on LW switched careers to directly work on things like AI safety. I don’t suppose there’s survey/census data that we could look up?
I have enough integrity to not pretend to believe in CDT just so I can take your money, but I will note that I’m pretty sure the linked Aaronson oracle is deterministic, so if you’re using the linked one then someone could just ask me to give them a script for a series of keys to press that gets a less-than-50% correct rate from the linked oracle over 100 key presses, and they could test the script. Then they could take your money and split it with me.
Of course, if you’re not actually using the linked oracle and secretly you have a more sophisticated oracle which you would use in the actual bet, then you shouldn’t be concerned. This isn’t free will, this could probably be bruteforced.
I disagree that we’re confusing multiple issues; my central point is that these things are deeply related. They form a pattern—a culture—which makes bike theft and rape not comparable in the way the OP wants them to be comparable.
You might not think that 4 through 6 count as ‘victim-blaming’, but they all contribute to the overall effect on the victim. Whether your advice is helpful or harmful can depend on a lot of factors—including whether a victim is being met with suspicion or doubt, whether a victim feels humiliated, and whether a victim feels safe reporting.
If someone is currently thinking, “Hmm, my bike got stolen. That sucks. I wonder how I can get that to not happen again?” then your advice to buy a different lock is probably going to be helpful! The victim is likely to want to listen to it, and be in a good mental state to implement that advice in the near future, and they’re not really going to be worried that you have some ulterior motive for giving the advice. When someone says, “Have you considered buying a different brand of bike lock?” I’m not scared that they’re going to follow-up that question by saying something like, “Well, it’s just that since you admit yourself that you didn’t buy the exact brand of bike lock I’m recommending, I don’t believe your bike was really stolen and I’m going to tell all our friends that you’re a reckless idiot who doesn’t lock their bike properly so they shouldn’t give you any sympathy about this so-called theft.”
If we lived in a world that had this sort of culture around bike theft—victim-disbelieving, victim-shaming, victim-blaming or whatever else you want to call it—then people might be thinking things like, “Oh my god my bike got stolen, I’m so scared to even tell anyone because I don’t know if they’ll believe me, what if they think I made it up? What if they tell me I’m too irresponsible and just shouldn’t ever ride bikes in the future ever again? What if they tell me I’m damaged goods because this happened to me?”
In that world, if someone tells you that their bike was stolen, responding, “did you lock it?” is an asshole thing to do. Because there will be some fraction of people who ask, “did you lock it?” and then, after that, say things like, “well, if you didn’t use a D-lock on both the wheels and the frame, then you probably just consented for someone to borrow it and you’re misremembering. You can’t go around saying your bike got stolen when it was probably just borrowed—I mean, imagine if your bike is found and the person who borrowed it gets arrested! You’d ruin someone’s life just because you misremembered giving them permission to borrow your bike. Next time, if you don’t consent for someone to take your bike, just use at least five locks.”
People who are feeling scared and vulnerable are not likely to be receptive to advice about bike locks, or feeling ready to go to the supermarket and get a new bike lock. If you offer advice about bike locks in that world, instead of thinking, “hmm that’s a great idea, I’ll go to the shops right now and buy that recommended bike lock,” they are more likely to be thinking, “oh fuck are they implying that they don’t believe my bike was really stolen? Are they going to tell my friends that I’m a stupid reckless person because I didn’t lock my bike properly?” In the world where we have a victim-blaming victim-shaming victim-disbelieving culture around bike theft, you need to reassure the bike theft victim that you are not going to be that asshole.
Rationality isn’t always about making the maximally theoretically correct statements all the time. Rationality is systematized winning. It doesn’t matter if the statement “you should buy a better bike lock” is literally true. It matters whether saying that statement causes good outcomes to happen. For bike theft, it probably causes good outcomes; the person hears the statement, goes out and buys a better bike lock, and their bike is less likely to be stolen in future. For rape, it causes bad outcomes; the person worries that you’re not a safe/supportive person to talk to, shuts down, and hides in their room to cry. You can argue that the hide-in-the-room-and-cry trauma response is irrational, and that doesn’t matter even one iota, because being an aspiring rationalist is about taking the actions with the best expected outcomes in an imperfect world where sometimes humans are imperfect and sometimes people are traumatised. You don’t control other people’s actions; you control your own. (And in our imperfect world, it’s not irrational for rape victims to be scared of talking to people who send signals that they might engage in victim-blaming/victim-shaming/victim-disbelieving.)
If people were forced to bet on their beliefs, I think most people would be forced to admit that they do understand this on some level; when you say “try buying this different bike lock” the expected outcome is that the victim is somewhat more likely to go shopping and buy that bike lock, whereas when you say “try wearing less revealing clothing” the expected outcome is that the victim feels crushed and traumatised and stops listening to you. When people give that advice, I don’t think they are actually making the victim any less likely to be raped again—they’re mostly just feeling righteous about saying things that they think the victim should listen to in some abstract sense. (To back this up, a lot of the advice that is most commonly shared—like “don’t wear revealing clothing” or “don’t walk down dark alleys at night” or “shout fire, don’t shout rape”—is basically useless or wrong. Rape is not mostly committed by complete strangers in dark alleys, and covering more skin doesn’t make someone less likely to be raped.)
If rationality was all about making the purest theoretically true statements, then sure, whatever, let’s go ahead and taboo some words. But rationality is about winning, so let’s take context into account and talk about the expected outcomes of our actions.
If you like, just aggregate all the “victim-blaming”/”victim-disbelieving”/”victim-humiliating” things into the question, “From the perspective of the victim who just disclosed something, what is p(this person is about to say or do something unpleasant | this person has said words that sound like unsolicited advice)?”
...Admittedly, I’m not sure the percentage it reports is always accurate; here’s a screenshot of it saying 0% while it clearly has correct guesses in the history (and indeed, a >50% record in recent history). I’m not sure if that percentage is possibly reporting something different to what I’m expecting it to track?
I’m surprised you’re willing to bet money on Aaronson oracles; I’ve played around with them a bit and I can generally get them down to only predicting me around 30-40% correctly. (The one in my open browser tab is currently sitting at 37% but I’ve got it down to 25% before on shorter runs).
I use relatively simple techniques:
deliberately creating patterns that don’t feel random to humans (like long strings of one answer) - I initially did this because I hypothesized that it might have hard-coded in some facts about how humans fail at generating randomness, now I’m not sure why it works; possibly it’s easier for me to control what it predicts this way?
once a model catches onto my pattern, waiting 2-3 steps before varying it (again, I initially did this because I thought it was more complex than it was, and it might be hypothesizing that I’d change my pattern once it let me know that it caught me; now I know the code’s much simpler and I think this probably just prevents me from making mistakes somehow)
glancing at objects around me (or tabs in my browser) for semi-random input (I looked around my bedroom and saw a fitness leaflet for F, a screwdriver for D, and a felt jumper for another F)
changing up the fingers I’m using (using my first two fingers to press F and D produces more predictable input than using my ring and little finger)
pretending I’m playing a game like osu, tapping out song beats, focusing on the beat rather than the letter choice, and switching which song I’m mimicking every ~line or phrase
just pressing both keys simultaneously so I’m not consciously choosing which I press first
Knowing how it works seems anti-helpful; I know the code is based on 5-grams, but trying to count 6-long patterns in my head causes its prediction rate to jump to nearly 80%. Trying to do much consciously at all, except something like “open my mind to the universe and accept random noise from my environment”, lets it predict me. But often I have a sort of ‘vibe’ sense for what it’s going to predict, so I pick the opposite of that, and the vibe is correct enough that I listen to it.
There might be a better oracle out there which beats me, but this one seems to have pretty simple code and I would expect most smart people to be able to beat it if they focus on trying to beat the oracle rather than trying to generate random data.
If you’re still happy to take the bet, I expect to make money on it. It would be understandable, however, if you don’t want to take the bet because I don’t believe in CDT.
If you mean this literally, it’s a pretty extraordinary claim! Like, if Alice is really doing important AI Safety work and/or donating large amounts of money, she’s plausibly saving multiple lives every year. Is the impact of being rude worse than killing multiple people per year?
(Note, I’m not saying in this comment that Alice should talk the way she does, or that Alice’s conversation techniques are effective or socially acceptable. I’m saying it’s extraordinary to claim that the toxic experience of Alice’s friends is equivalently bad to “any good she can do herself”. I think basically no amount of rudeness is equivalently bad to how good it is to save a life and/or help avert the apocalypse, but if you think it’s morally equivalent then I’d be really curious for your reasoning.)