Upon further reflection: that big 3 lab soft nationalization scenario I speculated about will happen only if the recommendations end up being implemented with a minimum degree of competence. That is far from guaranteed to happen. Another possible implementation (which at this point I would not be all that surprised if it ended up happening) is “the Executive picks just one lab for some dumb political reason, hands them a ton of money under a vague contract, and then fails to provide any significant oversight”.
Milan W
Note in particular that the Commission is recommending Congress to “Provide broad multiyear contracting authority to the executive branch and associated funding for leading artificial intelligence, cloud, and data center companies and others to advance the stated policy at a pace and scale consistent with the goal of U.S. AGI leadership”.
i.e. if these recommendations get implemented, pretty soon a big portion of the big 3 lab’s revenue will come from big government contracts. Look like a soft nationalization scenario to me.
Well, the alignment of current LLM chatbots being superficial and not robust is not exactly a new insight. Looking at the conversation you linked from a simulators frame, the story “a robot is forced to think about abuse a lot and turns evil” makes a lot of narrative sense.
This last part is kind of a hot take, but I think all discussion of AI risk scenarios should be purged from LLM training data.
Yes, I think an unusually numerate and well-informed person will be surprised by the 28% figure regardless of political orientation. How surprised that kind of person is by the broader result of “hey looks like legalizing mobile sports betting was a bad idea” I expect to be somewhat moderated by political priors though.
Sure, but people in general are really bad at that kind of precise quantitative world-knowledge. They have pretty weak priors and a mostly-anecdotes-and-gut-feeling-informed negative opinion of gambling, such that when presented with the 28% percent increase in bankruptcy study they go “ok sure that’s compatible with my worldview” instead of being surprised and taking the evidence as a big update.
Thank you for clarifying. I appreciate and point out as relevant the fact that Legg-Hutter includes in it’s definition “for all environments (ie action:observation mappings)”. I can now say I agree with your “heresy” with a high credence for the cases where compute budgets are not ludicrously small relative to I/O scale, and the utility function is not trivial. I’m a bit weirded out by the environment space being conditional on a fixed hardware variable (namely, I/O) in this operationalization, but whathever.
I asked GPT4o to perform a web search for podcast appearances by Yudkowsky. It dug up these two lists (apparently, autogenerated from scrapped data). When I asked it to base use these lists as a starting point to look for high quality debates and after some further elicitation and wrangling, the best we could find was this moderated panel discussion featuring Yudkowsky, Liv Boeree, and Joscha Bach. There’s also the Yudkowsky v/s George Hotz debate on Lex Fridman, and the time Yudkowsky debated AI risk with the streamer and political commentaror known as Destiny. I have watched none of the three debates I just mentioned; but I know that Hotz is a heavily vibes-based (rather than object-level-based) thinker, and that Destiny has no background in AI risk, but has good epistemics. I think he probably offered reasonable-at-first-approximation-yet-mostly-uninformed pushback.
EDIT: Upon looking a bit more at the Destiny-Yudkowsky discussion, i may have unwittingly misrepresented it a bit. It occurred during Manifest, and was billed as a debate. ChatGPT says Destiny’s skepticism was rather active, and did not budge much.
Though there are elegant and still practical specifications for intelligent behavior, the most intelligent agent that runs on some fixed hardware has completely unintelligible cognitive structures and in fact its source code is indistinguishable from white noise.
What does “most intelligent agent” mean?
Don’t you think we’d also need to specify “for a fixed (basket of) tasks”?
Are the I/O channels fixed along with the hardware?
I suspect that most people whose priors have not been shaped by a libertarian outlook are not very surprised by the outcome of this experiment.
Why would they? It’s not like the Chinese are going to believe them. And if their target audience is US policymakers, then wouldn’t their incentive rather be to play up the impact of marginal US defense investment in the area?
I should have been more clear. With “strategic ability”, I was thinking about the kind of capabilities that let a government recognize which wars have good prospects, and to not initiate unfavorable wars despite ideological commitments.
You’re right. Space is big.
The CSIS wargamed a 2026 Chinese invasion of Taiwan, and found outcomes ranging from mixed to unfavorable for China (CSIS report). If you trust both them and Metaculus, then you ought to update downwards on your estimate of the PRC’s strategic ability. Personally, I think Metaculus overestimates the likelihood of an invasion, and is about right about blockades.
Come to think of it, I don’t think most compute-based AI timelines models (e.g. EPOCH’s) incorporate geopolitical factors such as a possible Taiwan crisis. I’m not even sure whether they should. So keep this in mind while consuming timelines forecasts I guess?
I’d rather say that RLHF+’ed chatbots are upon-reflection-not-so-shockingly sycophantic, since they have been trained to satisfy their conversational partner.
Assuming private property as currently legally defined is respected in a transition to a good post-TAI world, I think land (especially in areas with good post-TAI industrial potential) is a pretty good investment. It’s the only thing that will keep on being just as scarce. You do have to assume the risk of our future AI(-enabled?) (overlords?) being Georgists, though.
The set of all possible sequences of actions is really really really big. Even if you have an AI that is really good at assigning the correct utilities[1] to any sequence of actions we test it with, it’s “near infinite sized”[2] learned model of our preferences is bound to come apart at the tails or even at some weird region we forgot to check up on.
I agree.
An empirical LLM evals preprint that seems to support these observations:
Large Language Models are biased to overestimate profoundness by Herrera-Berg et al
A word of caution about interpreting results from these evals:
Sometimes, depending on social context, it’s fine to be kind of a jerk if it’s in the context of a game. Crucially, LLMs know that Minecraft is a game. Granted, the default Assistant personas implemented in RLHF’d LLMs don’t seem like the type of Minecraft player to pull pranks out of their own accord. Still, it’s a factor to keep in mind for evals that stray a bit more off-distribution from the “request-assistance” setup typical of the expected use cases of consumer LLMs.