I work at the Alignment Research Center (ARC). I write a blog on stuff I’m interested in (such as math, philosophy, puzzles, statistics, and elections): https://ericneyman.wordpress.com/
Eric Neyman
Thanks for writing this. I think this topic is generally a blind spot for LessWrong users, and it’s kind of embarrassing how little thought this community (myself included) has given to the question of whether a typical future with human control over AI is good.
(This actually slightly broadens the question, compared to yours. Because you talk about “a human” taking over the world with AGI, and make guesses about the personality of such a human after conditioning on them deciding to do that. But I’m not even confident that AGI-enabled control of the world by e.g. the US government would be good.)
Concretely, I think that a common perspective people take is: “What would it take for the future to go really really well, by my lights”, and the answer to that question probably involves human control of AGI. But that’s not really the action-relevant question. The action-relevant question, for deciding whether you want to try to solve alignment, is how the average world with human-controlled AGI compares to the average AGI-controlled world. And… I don’t know, in part for the reasons you suggest.
Cool, you’ve convinced me, thanks.
Edit: well, sort of. I think it depends on what information you’re allowing yourself to know when building your statistical model. If you’re not letting yourself make guesses about how the LW population was selected, then I still think the SAT thing and the height thing are reasonable. However, if you’re actually trying to figure out an estimate of the right answer, you probably shouldn’t blind yourself quite that much.
These both seem valid to me! Now, if you have multiple predictors (like SAT and height), then things get messy because you have to consider their covariance and stuff.
Yup, I think that only about 10-15% of LWers would get this question right.
Yeah, I wonder if Zvi used the wrong model (the non-thinking one)? It’s specifically the “thinking” model that gets the question right.
Just a few quick comments about my “integer whose square is between 15 and 30” question (search for my name in Zvi’s post to find his discussion):
The phrasing of the question I now prefer is “What is the least integer whose square is between 15 and 30”, because that makes it unambiguous that the answer is −5 rather than 4. (This is a normal use of the word “least”, e.g. in competition math, that the model is familiar with.) This avoids ambiguity about which of −5 and 4 is “smaller”, since −5 is less but 4 is smaller in magnitude.
This Gemini model answers −5 to both phrasings. As far as I know, no previous model ever said −5 regardless of phrasing, although someone said o1 Pro gets −5. (I don’t have a subscription to o1 Pro, so I can’t independently check.)
I’m fairly confident that a majority of elite math competitors (top 500 in the US, say) would get this question right in a math competition (although maybe not in a casual setting where they aren’t on their toes).
But also this is a silly, low-quality question that wouldn’t appear in a math competition.
Does a model getting this question right say anything interesting about it? I think a little. There’s a certain skill of being careful to not make assumptions (e.g. that the integer is positive). Math competitors get better at this skill over time. It’s not that straightforward to learn.
I’m a little confused about why Zvi says that the model gets it right in the screenshot, given that the model’s final answer is 4. But it seems like the model snatched defeat from the jaws of victory? Like if you cut off the very last sentence, I would call it correct.
Here’s the output I get:
Thank you for making this! My favorite ones are 4, 5, and 12. (Mentioning this in case anyone wants to listen to a few songs but not the full Solstice.)
Yes, very popular in these circles! At the Bay Area Secular Solstice, the Bayesian Choir (the rationalist community’s choir) performed Level Up in 2023 and Landsailor this year.
My Spotify Wrapped
Yeah, I agree that that could work. I (weakly) conjecture that they would get better results by doing something more like the thing I described, though.
My random guess is:
The dark blue bar corresponds to the testing conditions under which the previous SOTA was 2%.
The light blue bar doesn’t cheat (e.g. doesn’t let the model run many times and then see if it gets it right on any one of those times) but spends more compute than one would realistically spend (e.g. more than how much you could pay a mathematician to solve the problem), perhaps by running the model 100 to 1000 times and then having the model look at all the runs and try to figure out which run had the most compelling-seeming reasoning.
What’s your guess about the percentage of NeurIPS attendees from anglophone countries who could tell you what AGI stands for?
I just donated $5k (through Manifund). Lighthaven has provided a lot of value to me personally, and more generally it seems like a quite good use of money in terms of getting people together to discuss the most important ideas.
More generally, I was pretty disappointed when Good Ventures decided not to fund what I consider to be some of the most effective spaces, such as AI moral patienthood and anything associated with the rationalist community. This has created a funding gap that I’m pretty excited about filling. (See also: Eli’s comment.)
Consider pinning this post. I think you should!
It took until I was today years old to realize that reading a book and watching a movie are visually similar experiences for some people!
Let’s test this! I made a Twitter poll.
Oh, that’s a good point. Here’s a freehand map of the US I drew last year (just the borders, not the outline). I feel like I must have been using my mind’s eye to draw it.
I think very few people have a very high-fidelity mind’s eye. I think the reason that I can’t draw a bicycle is that my mind’s eye isn’t powerful/detailed enough to be able to correctly picture a bicycle. But there’s definitely a sense in which I can “picture” a bicycle, and the picture is engaging something sort of like my ability to see things, rather than just being an abstract representation of a bicycle.
(But like, it’s not quite literally a picture, in that I’m not, like, hallucinating a bicycle. Like it’s not literally in my field of vision.)
Huh! For me, physical and emotional pain are two super different clusters of qualia.
I think this isn’t the sort of post that ages well or poorly, because it isn’t topical, but I think this post turned out pretty well. It gradually builds from preliminaries that most readers have probably seen before, into some pretty counterintuitive facts that aren’t widely appreciated.
At the end of the post, I listed three questions and wrote that I hope to write about some of them soon. I never did, so I figured I’d use this review to briefly give my takes.
This comment from Fabien Roger tests some of my modeling choices for robustness, and finds that the surprising results of Part IV hold up when the noise is heavier-tailed than the signal. (I’m sure there’s more to be said here, but I probably don’t have time to do more analysis by the end of the review period.,)
My basic take is that this really is a point in favor of well-evidenced interventions, but that the best-looking speculative interventions are nevertheless better. This is because I think “speculative” here mostly refers to partial measurement rather than noisy measurement. For example, maybe you can only foresee the first-order effects of an intervention, but not the second-order effects. If the first-order effect is a (known) quantity X1 and the second-order effect is an (unknown) quantity X2, then modeling the second-order effect as zero (and thus estimating the quality of the intervention as X1) isn’t a noisy measurement; it’s a partial measurement. It’s still your best guess given the information you have.
I haven’t thought this through very much. I expect good counter-arguments and counter-counter-arguments to exist here.
No—or rather, only if the measurement is guaranteed to be exactly correct. To see this, observe that the variance of a noisy, unbiased measurement is greater than the variance of the quantity you’re trying to measure (with equality only when the noise is zero), whereas the variance of a noiseless, partial measurement is less than the variance of the quantity you’re trying to measure.
Real-world measurements are absolutely partial. They are, like, mind-bogglingly partial. This point deserves a separate post, but consider for instance the action of donating $5,000 to the Against Malaria Foundation. Maybe your measured effect from the RCT is that it’ll save one life: 50 QALYs or so. But this measurement neglects the meat-eating problem: the expected-child you’ll save will grow up to eat expected-meat from factory farms, likely causing a great amount of suffering. But then you remember: actually there’s a chance that this child will have a one eight-billionth stake in determining the future of the lightcone. Oops, actually this consideration totally dominates the previous two. Does this child have better values than the average human? Again: mind-bogglingly partial!
(The measurements are also, of course, noisy! RCTs are probably about as un-noisy as it gets: for example, making your best guess about the quality of an intervention by drawing inferences from uncontrolled macroeconomic data is much more noisy. So the answer is: generally both noisy and partial, but in some sense, much more partial than noisy—though I’m not sure how much that comparison matters.)
The lessons of this post do not generalize to partial measurements at all! This post is entirely about noisy measurements. If you’ve partially measured the quality of an intervention, estimating the un-measured part using your prior will give you an estimate of intervention quality that you know is probably wrong, but the expected value of your error is zero.