I have signed no contracts or agreements whose existence I cannot mention.
plex
The switch to reasoning models does line up well, probably more cleanly. Moved that to main hypothesis, thanks.
Having some later responses makes it less likely they missed the change, curious if the other responses were closer to Dec or March. I would guess the not-current-researcher one being excluded probably makes sense? The datapoint from me is not exactly 2x on this, but ‘most of an approximately 2x’, so would need revisiting with the exact question before it could be directly included, and I’d imagine you’d want the source.
I still have some weight on higher research boost from AI than your model is expecting, due to other lines of evidence, but not putting quite as much weight on it.
Okay, that updates me some. I’m curious about what your alternate guess about the transition to the faster exponential on the METR long-horizon tasks, and whether you expect that to hold up or be not actually tracking something important?
(also please note that via me you now also have a very recent datapoint of a frontier AI researcher who thinks the METR speed-up of ~2x was mostly due to AI accelerating research)
Edit: How late in 2024? Because the trendline was only just starting to become apparent even right near the end and was almost invisible a couple months early, it’s pretty plausible to me that if you re-ran that survey now you would get different results. The researchers inside will have had a sense somewhat before releases, but also lag on updating is real.
I’m not claiming economic impact, I’m claiming AI research speed-up. I expect a software-only singularity, economic indicators may well not show dramatic changes before human disempowerment is inevitable.
Try finding chat logs or recordings of you talking to people where this part of you expressed itself with strong emotional resonance, if you have them, and setting aside an afternoon meditating on the experience of re-absorbing the logs. Personal communication can be even more powerful as a mental save-state than broadcast writing, though reading both seems valuable.
Talk to Rob Miles. I think he has a pretty similar structure in some ways, especially the wanting to build neat physical things. I bet you two would have fun with this.
One more thing I’ll share by DM at some point in the next couple weeks.
@Daniel Kokotajlo I think AI 2027 strongly underestimates current research speed-ups from AI. It expects the research speed-up is currently ~1.13x. I expect the true number is more likely around 2x, potentially higher.
Points of evidence:
I’ve talked to someone at a leading lab who concluded that AI getting good enough to seriously aid research engineering is the obvious interpretation of the transition to a faster doubling time on the METR benchmark. I claim advance prediction credit for new datapoints not returning to 7 months, and instead holding out at 4 months. They also expect more phase transitions to faster doubling times; I agree and stake some epistemic credit on this (unsure when exactly, but >50% on this year moving to a faster exponential).
I’ve spoken to a skilled researcher originally from physics who claims dramatically higher current research throughput. Often 2x-10x, and many projects that she’d just not take on if she had to do everything manually.
The leader of an 80 person engineering company which has the two best devs I’ve worked with recently told me that for well-specified tasks, the latest models are now better than their top devs. He said engineering is no longer a bottleneck.
Regularly hanging around on channels with devs who comment on the latest models, and getting the vibes of how much it seems to be speeding people up.
If correct, this propagates through the model to much shorter timelines.
Please do an epistemic spot check on these numbers by talking to representative people in ways that would turn up evidence about current speed-ups.[1]Edit: Eli said he’d be enthusiastic for someone else to get fresh data, I’m going to take a shot at this.
(also @Scott Alexander @Thomas Larsen @elifland @romeo @Jonas V)
- ^
You might so far have mostly been talking to the very best researchers who are probably getting (or at least claiming, obvious reason for tinted glasses here) smaller speed-ups?
You might need to sandbox information, a high-weight-class mind could find memes which fit into a small mind, and a small mind which is causally downstream of the powerful mind would have an unfair advantage. Without this it’s basically a game of powerful minds nearest unblocked neighbour-ing around any rules to pack their insights down small and smaller minds collecting them.
People have tried to measure the information throughput of biological humans. The very highest estimates, which come from image recognition tasks, are around 50 bits per second, and most estimates are more like 10 bits per second.
This is just the “inner brain”, the paper also introduces the “outer brain”. It’s thoroughly plausible that information not routed through the inner brain’s cohesive agency but still partially processed can leak via e.g. subtle facial cues or tone, likely resulting in the experience of a much higher bitrate in things like Circling or other interpersonal meditative practices.
I’ve noticed some interesting posts about using the framework of quantum field theory for mechanistic interpretability.
oh man, if the QFT math actually describes some dynamics of neural networks well this is going to throw a hilarious extra layer to the woo / scientism memetic war over the word “quantum” for brain stuff.
Why assume assuming Gaussian or sub-Gaussian error? I’d naively expect the error to find weird edge cases which end up being pretty far from the intended utility function, growing as the intelligence can explore more of the space?
(worthwhile area to be considering, tho)
This and your other recent post have raised my opinion of you significantly. I’m still pretty concerned about the capabilities outcomes of interpretability, both the general form of being able to open the black box enough to find unhobblings accelerating timelines, and especially automated interp pipelines letting AIs optimize their internals[1] as a likely trigger for hard RSI → game over, but it’s nice to see you’re seeing more of the strategic landscape than seemed apparent before.
- ^
please don’t do this
- ^
You seem to be in a different bubble to me in remarkably many ways, given the overlap of topics.
Last year I read through the past ~4 years of OpenPhil grants, was briefly reassured by seeing a bunch of good grants, then noticed that almost all of the ones which went to places which seemed to be doing work which might plausibly help with superintelligence were before Holden left. Then I was much less reassured.
Agree that probably an overly large portion of group attention is on this number. Agree that changes to p(doom) are not generally very interesting or worthwhile.
However p(doom) convos in general have some notable upsides that seem missing here:
It doesn’t let policymakers, alignment researchers, engineers or others improve their decision-making, or help them in anticipating the future.
Seems overstated, I would take very very different actions if I had a p(doom) below 20%, because those worlds have a different set of major bottlenecks.
Strongly agree on exchanging gears models being the actually useful thing, but find that hearing someone’s p(doom) is an excellent shortcut to which gears they have and are likely missing, to shape the conversation.
This is actually pretty cool! Feels like it’s doing the type of reasoning that might result in critical insight, and maybe even is one itself. It’s towards the upper tail of the distribution of research I’ve read by people I’m not already familiar with.
I think there’s big challenges to this solving AGI alignment including: probably this restriction bounds AI’s power a lot, but still feels like a neat idea and I hope you continue to explore the space of possible solutions.
Oh nice, stavros already got it before I posted :)
This is the path forward.
I have had and solved fairly extreme versions of this in myself, and have helped people with debilitating versions of this resolve it multiple times.
You’re stuck in a loop of some part of you pushing to do the object level thing so hard that it has no sensitivity to the parts of you that are averse to it. Whenever you notice you’re spinning your wheels; stop trying to force through the object level action and let yourself actually notice the feeling of resistance with open curiosity. Let let unfold into the full message that brain fragment is trying to send, rather than overcompressed “bad”/”aversion”.
Yes, I’m imagining if the author link-posts they can add a cross-link so viewers can participate.
No public comments will be hosted on our website as we don’t have the resources for moderation of public discussion. Authors can choose to link-post their work on the Alignment Forum or LessWrong to engage with a broader audience.
I think it’d be pretty important/useful if the UI shows links to publicly commentable link-posts where those exist.
Second this, if you succeed hard enough at automating empirical alignment research without automating the theoretical / philosophical foundations, probably you end up initiating RSI without having a well-grounded framework which has a chance of holding up to superintelligence. Automated interpretability seems especially likely to cause this kind of failure, even if it has some important benefits.
This seems like a very reasonable thing for AISI alignment to focus on. It’s shovel ready, has some good properties, probably scales to high enough intelligence that you can get useful research out of them before it breaks, and you’re correctly acknowledging up-front that this won’t be enough to handle full superintelligence.[1]
One thing I hope you’re tracking, and devote adequate attention to, is figuring out what kinds of research or strategy questions you plan to ask your debate system to try and end the acute risk period. You don’t want to have to do that on the fly at crunch time, and once a good version of this is available to do research for alignment it will be available to do capabilities work too, which means you probably don’t have long before someone sets off RSI.
At high enough power levels simulated or real human judges are the weak point, even if the rest of the protocol holds up.