I’m the chief scientist at Redwood Research.
ryan_greenblatt
Slow corporations as an intuition pump for AI R&D automation
One of the main drivers, perhaps the main driver[1], of algorithmic progress is compute for experiments. It seems unlikely that the effect you note could compensate for the reduced pace of capabilities progress.
- ↩︎
Both labor and compute have been scaled up over the last several years at big AI companies. My understanding is the scaling in compute was more important for algorithmic progress as it is hard to parallelize labor, the marginal employee is somewhat worse, the number of employees has been growing slower than compute, and the returns to compute vs faster serial labor seem similar at current margins. That’s not to say employees don’t matter, I’d guess Meta is substantially held back by worse employees (and maybe worse management).
- ↩︎
Doesn’t seem that wild to me? When we scale up compute we’re also scaling up the size of frontier training runs; maybe past a certain point running smaller experiments just isn’t useful (e.g. you can’t learn anything from experiments using 1 billionth of the compute of a frontier training run); and maybe past a certain point you just can’t design better experiments. (Though I agree with you that this is all unlikely to bite before a 10X speed up.)
Yes, but also, if the computers are getting serially faster, then you also have to be able to respond to the results and implement the next experiment faster as you add more compute. E.g., imagine a (physically implausible) computer which can run any experiment which uses less than 1e100 FLOP in less than a nanosecond. To maximally utilize this, you’d want to be able to respond to results and implement the next experiment in less than a nanosecond as well. This is of course an unhinged hypothetical and in this world, you’d also be able to immediately create superintelligence by e.g. simulating a huge evolutionary process.
I agree about reflexive endorsement being important, at least eventually, but don’t think this is out of reach while still having robust spec compliance and corrigibility.[1]
Probably not worth getting into the overall argument, but thanks for the reply.
- ↩︎
Humans often endorse complex or myopic drives on reflection! This isn’t something which is totally out of reach.
- ↩︎
Met in person or have other private knowledge also seems reasonable IMO.
I think this post would be better if it taboo’d the word alignment or at least defined it.
I don’t understand what the post means by alignment. My best guess is “generally being nice”, but I don’t see why this is what we wanted. I usually use the term alignment to refer to alignment between the AI and the developer, or using this definition, we say that an AI is aligned with an operator if the AI is trying to do what the operator wants it to do.
I wanted the ability to make AIs which are corrigible and which follow some specification precisely. I don’t see how starting by training AIs in simulated RL environments (seeming with any specific reference to corrigability or a spec?) could get an AI which follows our spec.
I generally don’t think it’s a good idea to put a probability on things where you have a significant ability to decide the outcome (i.e. probability of getting divorced), and instead encourage you to believe in pausing.
In this case, I can at least talk about the probability of a multi decade pause (with the motivation of delaying AI etc) if I were to be hit by a bus tomorrow. My number is unchanged, around 3%. (Maybe there are some good arguments for higher, I’m not sure.)
I don’t think Metaculus is that confident. Some questions:
https://www.metaculus.com/questions/19356/transformative-ai-date/
https://www.metaculus.com/questions/5406/world-output-doubles-in-4-years-by-2050/
https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/ (the resolution criteria for this is much weaker than AGI and reasonably likely to trigger much earlier).
Even the last of these has only ~80% by 2050.
Quick take titles should end in a period.
Quick takes (previously known as short forms) are often viewed via preview on the front page. This preview removes formatting and newlines for space reasons. So, if your title doesn’t end in a period and especially if capitalization doesn’t clearly denote a sentence boundary (like in this case where the first sentence starts in “I”), then it might be confusing.
Noting one other dynamic: advanced models are probably not going to act misaligned in everyday use cases (that consumers have an incentive to care about, though again revealed preference is less clear), even if they’re misaligned. That’s the whole deceptive alignment thing.
Agreed, but customers would also presumably be a bit worried that the AI would rarely cross them and steal their stuff or whatever which is somewhat different. Like there wouldn’t be a feedback loop toward this where we necessarily see a bunch of early failures, but if we’ve seen a bunch of cases where scheming powerseeking AIs in the lab execute well crafted misaligned plans, then customers might want an AI which is less likely to do this.
Not that important to get into, but I’d guess the probability of >3 decade long coordinated pause prior to “very scary AI (whatever you were worried might takeover)” (which is maybe better to talk about than AGI) is like 3% and I’m sympathetic to lower.
I’m skeptical of 95% on P(AGI < 2050|no pause) unless you mean something much weaker than “AI which can automate virtually all cognitive work” when you say AGI. Seems too confident for this type of event IMO.
I expect the general chaos combined with supply chain disruptions around Taiwan to have slowed things down.
Won’t supply chain disruption around Taiwan take at least many months to cause a considerable slowdown? Naively, we might expect that AI companies get around 10% more FLOP/s each month (for around 3.5x per year).
Suppose your view was that P(AGI if no pause/slow before 2050) = 80%. Then, if we condition on AGI after 2050, surely most of the probability mass isn’t due to pausing/slowing right?
So, what would be the mechanism if not some sort of technical research or exogenous factor (e.g. society getting wiser) over the intervening time.
Note that the full quote in context is:
There’s a Decent Chance of Having Decades
In a similar vein as the above, nobody associated with AI 2027 (or the market, or me) think there’s more than a 95% chance that transformative AI will happen in the next twenty years! I think most of the authors probably think there’s significantly less than a 90% chance of transformative superintelligence before 2045.
Daniel Kokotajlo expressed on the Win-Win podcast (I think) that he is much less doomy about the prospect of things going well if superintelligence is developed after 2030 than before 2030, and I agree. I think if we somehow make it to 2050 without having handed the planet over to AI (or otherwise causing a huge disaster), we’re pretty likely to be in the clear. And, according to everyone involved, that is plausible (but unlikely).
If this is how we win, we should be pushing hard to make it happen and slow things down!
Does mundane mandy care about stuff outside the solar system? Let alone stuff which is over 1 million light years away.
(Separately, I think the distal light cone is more like 10 B ly than 45 B ly as we can only reach a subset of the observable universe.)
I don’t think it is specific to pixel art, I think it is more about general visual understanding, particularly when you have to figure out downstream consequences from the visual understanding (like “walk to here”).
IMO, works much better to use SOTA LLMs over google translate, at least last I checked.
Should be possible for agents with long run preferences to strategy steal, so I don’t see why evolution is an issue from this perspective.
Nitpick:
The average Mechanical Turker gets a little over 75%, far less than o3’s 87.5%.
Actually, average Mechanical Turk performance is closer to 64% on the ARC-AGI evaluation set. Source: https://arxiv.org/abs/2409.01374.
(Average performance on the training set is around 76%, what this graph seemingly reports.)
So, I think this graph you pull the numbers from is slightly misleading.
Do you also dislike Moore’s law?
I agree that anchoring stuff to release dates isn’t perfect because the underlying variable of “how long does it take until a model is released” is variable, but I think is variability is sufficiently low that it doesn’t cause that much of an issue in practice. The trend is only going to be very solid over multiple model releases and it won’t reliably time things to within 6 months, but that seems fine to me.
I agree that if you add one outlier data point and then trend extrapolate between just the last two data points, you’ll be in trouble, but fortunately, you can just not do this and instead use more than 2 data points.
This also means that I think people shouldn’t update that much on the individual o3 data point in either direction. Let’s see where things go for the next few model releases.
I don’t think parallelism works very well among employees while it works great for compute.
I agree that labor is probably a somewhat more important input (as in, if you offered an AI company the ability to make its workers 2x faster in serial speed or 2x more compute, they would do better if they took the 2x serial speed. I’d guess the AI companies are roughly indifferent between 1.6x serial speed and 2x compute, but more like 1.35x vs 2x is also plausible.
It seems plausible to me that well enforced export controls cut compute by a factor of 3 for AI companies in china, and a larger factor is plausible longer term. This would substantially reduce the rate of algorithmic progress IMO.