@gabe_cc
Gabriel Alfour
How middle powers may prevent the development of artificial superintelligence
From Anthony: Control Inversion
From Vitalik: Galaxy brain resistance
Modeling the geopolitics of AI development
Thanks for responding.
For instance, in the past, it would have been conceivable for a single country of the G20 to unilaterally make it their priority to ban the development of ASI and its precursors.
In the past, it would have been conceivable for any country in the West to decide to fight off Big Tech and lead the collective fight.
I think unilateralism + leadership is quite unconceivable right now.
I am interested in any scenario you have in mind (not with the intent to fight whatever you suggest, just to see if there are ideas or mechanisms I may be missing).
And geopolitical will is something which can fluctuate. Now it’s absent, currently there is no geopolitical will to do so, but in the future it might emerge (then it might disappear again, and so on).
This is a failure of my writing: I should have made it clear that it’s a PNR when there’s no going back.
My point of “when there’s not enough geopolitical will left” was that nope, we can reach a point where there’s just not enough left. Not only “right now no body wants to regulate AI” but “right now, everything is so captive that there’s not really any independent will to govern left anymore”.
AI Timelines and Points of no return
This is not a problem, this is completely within the framework!
Even with a single AI accelerationist corporation and a single employee, you may reason about bargaining power (Wikipedia).
What is true regardless of the number of accelerationist corps is that the influx of safety-branded researchers willing to work for them will drive down the safety premium.
For instance, 80k hours tried to increase their supply, and is proud to have done so. Like they say, their advisees now work at DeepMind and Anthropic!
It will cost you nothing to “bribe” a Utilitarian
Nice comment.
This deals with a lot of the themes from the follow-up essay, which I expect you may be interested in.
Curious what makes you think this.
Because there is a reason for why Cursor and Claude Code exist. I’d suggest looking at what they do for more details.
METR is not in the business of building code agents. Why is their work informing so much of your views on the usefulness Cursor or Claude Code?
This is literally the point I make above.
Either you fail to capture the relevant capabilities and build unwarranted confidence that things are ok, or you are doing public competitive elicitation & amplification work.
(I can’t really tell if this post is trying to argue the overhang is increasing or just that there is some moderately sized overhang ongoingly.)
It has increased on some axes (companies are racing as fast as they can and the capital and research is by far LONG scaling), and reduced on some others (low-hanging fruits get plucked first).
The main point is that it is there and consistently under-estimated.
For instance, there are still massive returns to spending an hour on learning and experimenting with prompt engineering techniques. Let alone more advanced approaches.
This thus leads to a bias of over-estimating the safety of our systems, except if you expect that our evaluators are better elicitators than not only existing AI research engineers, but like, the ones over the next two, five or ten years.
Code Agents (Cursor or Claude Code) are much better at performing code tasks than their fine-tune equivalent, mainly because of the scaffolding.
When I told you that we should not put 4% of the global alignment spending budget in AI Village, you asked me if I thought METR should also not get as much funding as it does.
It should now be more legible why.
From my point of view, both of AI Village and METR, on top of not doing the straightforward thing of advocating for a pause, are bad on their own terms.
Either you fail to capture the relevant capabilities and build unwarranted confidence that things are ok, or you are doing public competitive elicitation & amplification work.
What is an example of something useful you think could in theory be done with current models but isn’t being elicited in favor of training larger models?
Better prompt engineering, fine-tuning, interpretability, scaffolding, sampling.
Fast-forward button
I think you are may be failing to make an argument along the line of “But people are already working on this! Markets are efficient!”
To which my response is “And thus 3 years from now, we’ll know what to do with models much more than we do, even if you personally can’t come up with an example now. The same way we now know what to do with models much more than 3 years ago.”
Except if you expect this not be the case, like by having directly worked on juicing models or followed people who have done so and failed, you shouldn’t really expect your failure to come up with such examples to be informative.
We are likely in an AI overhang, and this is bad.
If you are on Paul/Quentin side, “lots of slack” would be enough to concede, but they do not think there’s lots of slack.
If you are on Eliezer/Nate side, “little slack” is far from enough to concede: it’s about whether humanity can and will do something with that slack.
So this is not a crux.
Nevertheless, this concept could help prevent a very common failure mode in the debate.
Namely, at any point in the debate, either side could ask “Are you arguing that there is lots/little slack, that we are willing/unwilling to use that slack, or that we are able/unable to use that slack?”, which I expect could clear some amount of talking past each other.
How people politically confront the Modern Eldritch
A simple example where understanding an underlying problem doesn’t solve the problem: I understand fairly well why I’m tempted to eat too many potato chips, and why this is bad for me, and what I could do instead. And yet, sometimes I still eat more potato chips than I intend.
This is a great example.
Some people, specifically thanks to their better understanding of themselves, do not find themselves eating more potato chips than they intend.
There is more.
I believe...
People and society are largely well calibrated. People who are deemed (by themselves or society) to be bad at maths, at sports, at arts, etc. are usually bad at them.
People and society are not perfectly calibrated.
People are sometimes under-confident in their abilities. This is often downstream of them lacking confidence.
People are sometimes over-confident in their abilities. This is often downstream of them being too confident.
Our society does seem to inculcate in its members the idea that certain things are only for super-smart people to do, and whoever you are, you are not smart enough to do an impactful thing.
Most people would fail at passing the bar and the USMLE. This is why most people do not attempt them, and this is why our society tells them not to.
I believe it is load bearing, but in the straightforward way: it would be catastrophic if everyone tried to study things far beyond their abilities and wasted their time.
Clearly relevant, thanks.