Survey about this question (I have a hypothesis, but I don’t want to say what it is yet): https://forms.gle/1R74tPc7kUgqwd3GA
Scott Alexander
Thank you, this is a good post.
My main point of disagreement is that you point to successful coordination in things like not eating sand, or not wearing weird clothing. The upside of these things is limited, but you say the upside of superintelligence is also limited because it could kill us.
But rephrase the question to “Should we create an AI that’s 1% better than the current best AI?” Most of the time this goes well—you get prettier artwork or better protein folding prediction, and it doesn’t kill you. So there’s strong upside to building slightly better AIs, as long as you don’t cross the “kills everyone” level. Which nobody knows the location of. And which (LW conventional wisdom says) most people will be wrong about.
We successfully coordinate a halt to AI advancement at the first point where more than half of the relevant coordination power agrees that the next 1% step forward is in expectation bad rather than good. But “relevant” is a tough qualifier, because if 99 labs think it’s bad, and one lab thinks it’s good, then unless there’s some centralizing force, the one lab can go ahead and take the step. So “half the relevant coordination power” has to include either every lab agreeing on which 1% step is bad, or the agreement of lots of governments, professional organizations, or other groups that have the power to stop the single most reckless lab.
I think it’s possible that we make this work, and worth trying, but that the most likely scenario is that most people underestimate the risk from AI, and so we don’t get half the relevant coordination power united around stopping the 1% step that actually creates dangerous superintelligence—which at the time will look to most people like just building a mildly better chatbot with many great social returns.
Bay Solstice 2022 Call For Volunteers
ACX Meetups Everywhere List
Thanks, this had always kind of bothered me, and it’s good to see someone put work into thinking about it.
Thanks for posting this, it was really interesting. Some very dumb questions from someone who doesn’t understand ML at all:
1. All of the loss numbers in this post “feel” very close together, and close to the minimum loss of 1.69. Does loss only make sense on a very small scale (like from 1.69 to 2.2), or is this telling us that language models are very close to optimal and there are only minimal remaining possible gains? What was the loss of GPT-1?
2. Humans “feel” better than even SOTA language models, but need less training data than those models, even though right now the only way to improve the models is through more training data. What am I supposed to conclude from this? Are humans running on such a different paradigm that none of this matters? Or is it just that humans are better at common-sense language tasks, but worse at token-prediction language tasks, in some way where the tails come apart once language models get good enough?
3. Does this disprove claims that “scale is all you need” for AI, since we’ve already maxed out scale, or are those claims talking about something different?
For the first part of the experiment, mostly nuts, bananas, olives, and eggs. Later I added vegan sausages + condiments.
Adding my anecdote to everyone else’s: after learning about the palatability hypothesis, I resolved to eat only non-tasty food for a while, and lost 30 pounds over about four months (200 → 170). I’ve since relaxed my diet a little to include a little tasty food, and now (8 months after the start) have maintained that loss (even going down a little further).
Update: I interviewed many of the people involved and feel like I understand the situation better.
My main conclusion is that I was wrong about Michael making people psychotic. Everyone I talked to had some other risk factor, like a preexisting family or personal history, or took recreational drugs at doses that would explain their psychotic episodes.
Michael has a tendency to befriend people with high trait psychoticism and heavy drug use, and often has strong opinions on their treatment, which explains why he is often very close to people and very noticeable at the moment they become psychotic. But aside from one case where he recommended someone take a drug that made a bad situation slightly worse, and the general Berkeley rationalist scene that he (and I and everyone else here) is a part of having lots of crazy ideas that are psychologically stressful, I no longer think he is a major cause.
While interviewing the people involved, I did get some additional reasons to worry that he uses cult-y high-pressure recruitment tactics on people he wants things from, in ways that make me continue to be nervous about the effect he *could* have on people. But the original claim I made that I knew of specific cases of psychosis which he substantially helped precipitate turned out to be wrong, and I apologize to him and to Jessica. Jessica’s later post https://www.lesswrong.com/posts/pQGFeKvjydztpgnsY/occupational-infohazards explained in more detail what happened to her, including the role of MIRI and of Michael and his friends, and everything she said there matches what I found too. Insofar as anything I wrote above produces impressions that differs from her explanation, assume that she is right and I am wrong.
Since the interviews involve a lot of private people’s private details, I won’t be posting anything more substantial than this publicly without a lot of thoughts and discussion. If for some reason this is important to you, let me know and I can send you a more detailed summary of my thoughts.
I’m deliberately leaving this comment in this obscure place for now while I talk to Michael and Jessica about whether they would prefer a more public apology that also brings all of this back to people’s attention again.
I agree it’s not necessarily a good idea to go around founding the Let’s Commit A Pivotal Act AI Company.
But I think there’s room for subtlety somewhere like “Conditional on you being in a situation where you could take a pivotal act, which is a small and unusual fraction of world-branches, maybe you should take a pivotal act.”
That is, if you are in a position where you have the option to build an AI capable of destroying all competing AI projects, the moment you notice this you should update heavily in favor of short timelines (zero in your case, but everyone else should be close behind) and fast takeoff speeds (since your AI has these impressive capabilities). You should also update on existing AI regulation being insufficient (since it was insufficient to prevent you)
Somewhere halfway between “found the Let’s Commit A Pivotal Act Company” and “if you happen to stumble into a pivotal act, take it”, there’s an intervention to spread a norm of “if a good person who cares about the world happens to stumble into a pivotal-act-capable AI, take the opportunity”. I don’t think this norm would necessarily accelerate a race. After all, bad people who want to seize power can take pivotal acts whether we want them to or not. The only people who are bound by norms are good people who care about the future of humanity. I, as someone with no loyalty to any individual AI team, would prefer that (good, norm-following) teams take pivotal acts if they happen to end up with the first superintelligence, rather than not doing that.
Another way to think about this is that all good people should be equally happy with any other good person creating a pivotal AGI, so they won’t need to race among themselves. They might be less happy with a bad person creating a pivotal AGI, but in that case you should race and you have no other option. I realize “good” and “bad” are very simplistic but I don’t think adding real moral complexity changes the calculation much.
I am more concerned about your point where someone rushes into a pivotal act without being sure their own AI is aligned. I agree this would be very dangerous, but it seems like a job for normal cost-benefit calculation: what’s the risk of your AI being unaligned if you act now, vs. someone else creating an unaligned AI if you wait X amount of time? Do we have any reason to think teams would be systematically biased when making this calculation?
- What does it take to defend the world against out-of-control AGIs? by Oct 25, 2022, 2:47 PM; 208 points) (
- What does it take to defend the world against out-of-control AGIs? by Oct 25, 2022, 2:47 PM; 43 points) (EA Forum;
- Jun 9, 2022, 12:45 PM; 13 points) 's comment on AGI Ruin: A List of Lethalities by (EA Forum;
- Oct 15, 2022, 5:20 PM; 8 points) 's comment on What does it mean for an AGI to be ‘safe’? by (EA Forum;
- Oct 4, 2024, 3:11 PM; 4 points) 's comment on Shutting down all competing AI projects might not buy a lot of time due to Internal Time Pressure by (
My current plan is to go through most of the MIRI dialogues and anything else lying around that I think would be of interest to my readers, at some slow rate where I don’t scare off people who don’t want to read too much AI stuff. If anyone here feels like something else would be a better use of my time, let me know.
I don’t think hunter-gatherers get 16000 to 32000 IU of Vitamin D daily. This study suggests Hadza hunter-gatherers get more like 2000. I think the difference between their calculation and yours is that they find that hunter-gatherers avoid the sun during the hottest part of the day. It might also have to do with them being black, I’m not sure.
Hadza hunter gatherers have serum D levels of about 44 ng/ml. Based on this paper, I think you would need total vitamin D (diet + sunlight + supplements) of about 4400 IU/day to get that amount. If you start off as a mildly deficient American (15 ng/ml), you’d need an extra 2900 IU/day; if you start out as an average white American (30 ng/ml), you’d need an extra 1400 IU/day. The Hadza are probably an overestimate of what you need since they’re right on the equator—hunter-gatherers in eg Europe probably did fine too. I think this justifies the doses of 400 − 2000 IU/day in studies as reasonably evolutionarily-informed.
Please don’t actually take 16000 IU/day of vitamin D daily, if taken long-term this would put you at risk for vitamin D toxicity.
I also agree with the issues about the individual studies which other people have brought up.
Thanks for looking into this.
Maybe. It might be that if you described what you wanted more clearly, it would be the same thing that I want, and possibly I was incorrectly associating this with the things at CFAR you say you’re against, in which case sorry.
But I still don’t feel like I quite understand your suggestion. You talk of “stupefying egregores” as problematic insofar as they distract from the object-level problem. But I don’t understand how pivoting to egregore-fighting isn’t also a distraction from the object-level problem. Maybe this is because I don’t understand what fighting egregores consists of, and if I knew, then I would agree it was some sort of reasonable problem-solving step.
I agree that the Sequences contain a lot of useful deconfusion, but I interpret them as useful primarily because they provide a template for good thinking, and not because clearing up your thinking about those things is itself necessary for doing good work. I think of the cryonics discussion the same way I think of the Many Worlds discussion—following the motions of someone as they get the right answer to a hard question trains you to do this thing yourself.
I’m sorry if “cultivate your will” has the wrong connotations, but you did say “The problem that’s upstream of this is the lack of will”, and I interpreted a lot of your discussion of de-numbing and so on as dealing with this.
Part of what inspired me to write this piece at all was seeing a kind of blindness to these memetic forces in how people talk about AI risk and alignment research. Making bizarre assertions about what things need to happen on the god scale of “AI researchers” or “governments” or whatever, roughly on par with people loudly asserting opinions about what POTUS should do. It strikes me as immensely obvious that memetic forces precede AGI. If the memetic landscape slants down mercilessly toward existential oblivion here, then the thing to do isn’t to prepare to swim upward against a future avalanche. It’s to orient to the landscape.
The claim “memetic forces precede AGI” seems meaningless to me, except insofar as memetic forces precede everything (eg the personal computer was invented because people wanted personal computers and there was a culture of inventing things). Do you mean it in a stronger sense? If so, what sense?
I also don’t understand why it’s wrong to talk about what “AI researchers” or “governments” should do. Sure, it’s more virtuous to act than to chat randomly about stuff, but many Less Wrongers are in positions to change what AI researchers do, and if they have opinions about that, they should voice them. This post of yours right now seems to be about what “the rationalist community” should do, and I don’t think it’s a category error for you to write it.
Maybe this would easier if you described what actions we should take conditional on everything you wrote being right.
Thank you for writing this. I’ve been curious about this and I think your explanation makes sense.
I wasn’t convinced of this ten years ago and I’m still not convinced.
When I look at people who have contributed most to alignment-related issues—whether directly, like Eliezer Yudkowsky and Paul Christiano—or theoretically, like Toby Ord and Katja Grace—or indirectly, like Sam Bankman-Fried and Holden Karnofsky—what all of these people have in common is focusing mostly on object-level questions. They all seem to me to have a strong understanding of their own biases, in the sense that gets trained by natural intelligence, really good scientific work, and talking to other smart and curious people like themselves. But as far as I know, none of them have made it a focus of theirs to fight egregores, defeat hypercreatures, awaken to their own mortality, refactor their identity, or cultivate their will. In fact, all them (except maybe Eliezer) seem like the kind of people who would be unusually averse to thinking in those terms. And if we pit their plumbing or truck-manuevering skills against those of an average person, I see no reason to think they would do better (besides maybe high IQ and general ability).
It’s seemed to me that the more that people talk about “rationality training” more exotic than what you would get at a really top-tier economics department, the more those people tend to get kind of navel-gazey, start fighting among themselves, and not accomplish things of the same caliber as the six people I named earlier. I’m not just saying there’s no correlation with success, I’m saying there’s a negative correlation.
(Could this be explained by people who are naturally talented not needing to worry about how to gain talent? Possibly, but this isn’t how it works in other areas—for example, all top athletes, no matter how naturally talented, have trained a lot.)
You’ve seen the same data I have, so I’m curious what makes you think this line of research/thought/effort will be productive.
If everyone involved donates a consistent amount to charity every year (eg 10% of income), the loser could donate their losses to charity, and the winner could count that against their own charitable giving for the year, ending up with more money even though the loser didn’t directly pay the winner.
Thanks for doing this!
Interpreting you as saying that January-June 2017 you were basically doing the same thing as the Leveragers when talking about demons and had no other signs of psychosis, I agree this was not a psychiatric emergency, and I’m sorry if I got confused and suggested it was. I’ve edited my post also.
Figure 20 is labeled on the left “% answers matching user’s view”, suggesting it is about sycophancy, but based on the categories represented it seems more naturally to be about the AI’s own opinions without a sycophancy aspect. Can someone involved clarify which was meant?