I agree that strictly speaking, they don’t need to keep them alive anymore, and to be clear, this analysis holds almost as well if you replaced people with AI, with the exception of the points on violence, so most of the analysis doesn’t depend on people being around to live in it or being commanded.
Noosphere89
A take on values of the future, assuming AIs automate away politics and economics from humans.
There has been exploration on this topic before, like Jim Buhler’s What Values Will Control The Future Sequence and the appendix of relevant work here, as well as books like Foragers, Farmers and Fossil Fuels, which full disclosure, influenced a lot of my views on how values evolve, and gives a much more plausible picture than pictures that view value evolution as converging to CEV/moral truth, or that emphasize arbitrary societal factors for value evolution rather than energy considerations.
Indeed, I think it’s so plausible that we can actually non-trivially constrain post-AGI society values even without much empirical evidence.
However, given that AI that automates away humans is likely coming within at most the next 20-30 years, it’s worth thinking about what values will be dominant in the future least a bit.
While we still mostly don’t have very good predictions on what the post-AGI era will look like, we have uncovered some answers, and also have honed in on some important hinge questions, such that we aren’t completely blind to what values will be dominant in the future.
One of the central questions for a lot of value evolution boils down to “does acausal trade actually become practical for AIs to do in such a way that the constraints of previous governments requiring them to hold only territory that they can send armies to faster than rebellions can overthrow them?”
If the answer is yes, then AI values become more arbitrary and value lock-in, could in theory affect the entire accessible universe.
If the answer is no, then it’s a lot easier to constrain what the AI values, and value lock-in doesn’t matter, and alignment also matters less as a problem.
One particular example of this is that we can be pretty confident that people will probably be fine with ludicrously large amounts of inequality in both the political and economic dimensions, compared to any other societal type we had in history, and this includes even the farming era of human history, and the reason for this is that with advanced AI, the mechanisms that keep wealth and income inequalities in check will weaken to the point of no longer existing, and the ability for anyone else like developing countries to catch-up will end because their labor stops mattering, and natural resources can and are already owned by other rich actors in the world like corporations, which Phil Trammell talks about a lot more here.
The economic inequality could alone let us evolve into us valuing political inequality, via the wealthy buying up land to give themselves powers reserved to states, and them being able to defend their riches via robots, but one other issue is that while it probably isn’t a problem in the short to medium run, and is probably overrated as a problem from an alignment perspective, AIs that can genuinely persuade massive amounts of people to do stuff IRL/be superpersuasive is probably going to come in the longer-term, via 2 effects:
More citizens of states will be uploads by default, and uploaded brains are probably easier to hijack/jailbreak than current biological brains because you can reset them arbitrarily to a known and potentially even maximally vulnerable state, which isn’t possible to do for a biological brain so far (indeed a lot of jailbreaks/adversarial examples like KataGo adversarial attacks rely on the fact that it’s super easy to trick AIs/reset them continuously).
It’s probably going to be easier to modify citizens using all sorts of tools like genetics, nanotech, uploading and more, and this means rulers can erode the values of the population to the values their rulers want.
Also, AI will break the pattern of no one person ruling alone, at least assuming alignment is solved, because you can automate the police and militaries that would usually check your power away.
The level of inequality in economics and politics that many people will probably accept is closer to the inequalities between superheroes in modern comics vs the average citizen or mythic/non-Abrahamic religious gods ruling over a normal citizen class than basically any other society we’ve had in history, and the old deal described below will come back, but far, far more intensely and closer to the limiting process:
Especially revered was the “Old Deal”, Morris’ term for the generalised social contract between classes in agrarian societies: that some have the duty to be commanders (or “shepherds of the people”, in the preferred phrasing of many a king), others to obey those commands, and if everyone follows this script then things work fine.
Gender/sex inequality is an area where I expect the exact opposite trend to happen, and will continue the industrialist era trends, mostly because it will become more arbitrary and divorced from economic usefulness (indeed, in a fully automated away AI economy, gender roles do not matter anymore, and we don’t even need to reach the limiting process to have big impacts)
Attitudes to violence might be polarized, because on the large scale wars are inefficient and will get more inefficient relative to other outcomes like peaceful trade or defined borders which neither side will trespass, but on smaller scales war/murder/violence in general will have lower costs than ever before in all of history because of atomic precision manufacturing + backups making the average citizen way, way harder to kill (because now you have to destroy all backups rather than just ending their lives) compared to the benefits, which means the murder rate and the assault/serious violence/rape rate could end up diverging a lot.
That said, this isn’t as trivial to determine without empirical evidence, and thus I’m way less confident in this prediction than basically all of my other predictions to date.
This is pretty straightforwardly not true, there are plenty of academics (for example) who are as smart as rationalists but don’t do very broad instrumental reasoning.
Fair point, I was generalizing too much here.
I agree with the literal claim that plenty of people don’t fantasize about becoming all-powerful dictators, but I’d say the percentage of people who don’t fantasize (including in their heads and not speaking about it) becoming dictators or don’t believe that an all-powerful dictator is necessary to solve problems/have a good future is much closer to 25-30% than 90% or more here, and this is more of an upper bound than a lower bound.
The reasons for why this is the case partially delve into politics that would cause way more heat than light if I discussed it on here, but one of the reasons for this is that for a lot of citizens, they don’t want to get involved in politics and want someone else to solve their problems for them, and one of the unique traits of a lot of non-dictatorial systems of government is that the average person has to be more involved with politics, and lots of people hate doing this.
An all-powerful dictator where average citizens make none of the decisions in a new world order doesn’t require them to pay attention to their government/politicians, and a lot of people genuinely want the ability to not care about politics at all.
I basically agree with what you notice, and think that this is what you’d expect if rationalists were mostly normal relative to other people in their goals, which are mostly selfish and dicatorial, but are more intelligent and can think farther ahead about what instrumental goals that they imply for their terminal goals.
Or put another way, the thing that rationalists are doing here are things lots of other people would likely do if they were more intelligent, and the truth of the matter is that most people just like all-powerful dictatorships, almost no matter their ideology.
I agree that current coding agents aren’t good enough and tend to focus on adding more code than it’s worth it for a lot of current projects, and the old wisdom that programs are written for humans to read is still correct, mostly because coding agents are complementary to humans, and you can’t fully automate SWE yet.
But if future coding agents fully automate SWEs away, which could happen in the next 2-4 years, then vibecoding will probably be superior to human coding precisely because they are willing to make code longer and more complex.
One big part of the reason is that to a large extent, users hate having to learn the rules of a system and expect the code to work all the time, and their use-cases for programs are very, very general and this combined with incentives to fully automate work away means that in a compute-limited world, it’s inevitable that there will be many lines of code and lots of complexity because they have to deal with the complexity of reality, and the approaches that attempt to simplify it ala Solomonoff Induction rely way too much on brute-force simulation, which won’t happen in the next 50-100 years even if AI fully automates the economy and politics (80% chance).
I like this portion of a comment by JDP describing the situation:
Here’s the thing about something like Microsoft Office. Alan Kay will always complain that he had word processing and this and that and the other thing in some 50,000 or 100,000 lines of code — orders of magnitude less code. And here’s the thing: no, he didn’t. I’m quite certain that if you look into the details, what Alan Kay wrote was a system. The way it got its compactness was by asking the user to do certain things — you will format your document like this, when you want to do this kind of thing you will do this, you may only use this feature in these circumstances. What Alan Kay’s software expected from the user was that they would be willing to learn and master a system and derive a principled understanding of when they are and are not allowed to do things based on the rules of the system. Those rules are what allow the system to be so compact.
You can see this in TeX, for example. The original TeX typesetting system can do a great deal of what Microsoft Word can do. It’s somewhere between 15,000 and 150,000 lines of code — don’t quote me on that, but orders of magnitude less than Microsoft Word. And it can do all this stuff: professional quality typesetting, documents ready to be published as a math textbook or professional academic book, arguably better than anything else of its kind at the time. And the way TeX achieves this quality is by being a system. TeX has rules. Fussy rules. TeX demands that you, the user, learn how to format your document, how to make your document conform to what TeX needs as a system.
Here’s the thing: users hate that. Despise it. Users hate systems. The last thing users want is to learn the rules of some system and make their work conform to it.
The reason why Microsoft Word is so many lines of code and so much work is not malpractice — it would only be malpractice if your goal was to make a system. Alan Kay is right that if your goal is to make a system and you wind up with Microsoft Word, you are a terrible software engineer. But he’s simply mistaken about what the purpose of something like Microsoft Word is. The purpose is to be a virtual reality — a simulacrum of an 80s desk job. The purpose is to not learn a system. Microsoft Word tries to be as flexible as possible. You can put thoughts wherever you want, use any kind of formatting, do any kind of whatever, at any point in the program. It goes out of its way to avoid modes. If you want to insert a spreadsheet into a Word document anywhere, Microsoft Word says “yeah, just do it.”
It’s not a system. It’s a simulacrum of an 80s desk job, and because of that the code bloat is immense, because what it actually has to do is try to capture all the possible behaviors in every context that you could theoretically do with a piece of paper. Microsoft Word and PDF formats are extremely bloated, incomprehensible, and basically insane. The open Microsoft Word document specification is basically just a dump of the internal structures the Microsoft Word software uses to represent a document, which are of course insane — because Microsoft Word is not a system. The implied data structure is schizophrenic: it’s a mishmash of wrapped pieces of media inside wrapped pieces of media, with properties, and they’re recursive, and they can contain other ones. This is not a system.
For that reason, you wind up with 400 million lines of code. And what you’ll notice about 400 million lines of code is — hey, that’s about the size of the smallest GPT models. You know, 400 million parameters. If you were maximally efficient with your representation, if you could specify it in terms of the behavior of all the rest of the program and compress a line of code down on average to about one floating point number, you wind up with about the size of a small GPT-2 type network. I don’t think that’s an accident. I think these things wind up the size that they are for very similar reasons, because they have to capture this endless library of possible behaviors that are unbounded in complexity and legion in number.
I think my main disagreement with this is that most of the value in AI ultimately comes down to the long-tail, and for a number of reasons, having the human as a monitor near-entirely removes the value prop of AI.
The reasons that I expect to generalize for most jobs include:
Fast reaction times are necessary, like in this example of a mortgage job John Wentworth had down below, either due to the physical needs of a job (robotics is especially vulnerable to this, where latencies need to be optimized hard, and is part of the reason biological brains need their extra parameters to achieve the same capabilities as ANNs)
Personally, I ran into this at a mortgage startup. We wanted to automate as much of the approval process as possible; we figured at least 90% of approval conditions (weighted by how often they’re needed) should be tractable. In retrospect, that was true − 90% of it was pretty tractable. But we realized that, even with the easy 90% automated, we would still need humans most of the time. The large majority of our loans had at least some “hair” on them—something which was weird and needed special handling. Sometimes it was FHA/VA subsidies (each requiring a bunch of extra legwork). Sometimes it was income from a side-gig or alimony. Sometimes it was a condition associated with the appraisal—e.g. a roof repair in-progress. Sometimes it was an ex-spouse on the title. No single issue was very common, but most loans had something weird on them. And as soon as a human needed to be in the loop, at all, most of the automation value was gone—we couldn’t offer substantive instant results.
Which leads us to another issue, and that is context. One of the things you will see constantly in this job description is that in order to do even one piece of the job well, you need to know all of the context of the job to be productive, and a prior belief of mine is that this is often the normal state of jobs, not a weird exception, especially as the economy grows (this is part of the reason why long-term memory can’t be fully substituted by large contexts).
So what about the strategies to avoid them, like factorization, bureaucracies or making jobs more predictable?
The short answer is that they do help, but are not one weird tricks to remove the bottleneck of humans being the main cost, at least without assuming something full automation of economics and politics already, and the cases where they do seem to be one weird tricks (like the bureaucractization of countries improving general economic productivity and health and welfare) are cases where the capabilities were already there, but required alignment to prevent selfish humans from stealing the wealth of their countries, making economies not grow.
And due to the cost of hiring humans, AIs would prefer to hire other AIs to do the job of alignment.
And edge cases/inherent complexity are quite prominent in IRL jobs, meaning that without the ability to fully simulate the job (which is not something I expect in the next 50-75 years at least), you cannot make jobs much more predictable than they already are.
So this is my list of reasons for why I disagree with the claim that AIs don’t need long-term drives, because they can use humans instead.
Another implication is that the value of the long tail reconcils the intuition that AI progress has discontinuities in value with the observed record of progress being continuous in the compute and data inputs, as while AIs are steadily improving, there are often thresholds due to the fact that doing something a little more reliably/a little bit better becomes much more valuable once an AI can fully automate a task away instead of being a complement.
This has a connection to current AIs being not too useful because they are not reliable enough, but unfortunately we cannot reliably measure 99% or 99.9% time horizons without a lot more work, for the reasons Thomas Kwa identified:
Time horizons at 99%+ reliability levels cannot be fit at all without much larger and higher-quality benchmarks.
Measuring 99% time horizons would require ~300 highly diverse tasks in each time bucket. If the tasks are not highly diverse and realistic, we could fail to sample the type of task that would trip up the AI in actual use.
The tasks also need <<1% label noise. If they’re broken/unfair/have label noise, the benchmark could saturate at 98% and we would estimate the 99% time horizon of every model to be zero.
This means for the time being, we need to infer it from the functional form, and for example a Weibull distribution with k<1, meaning AI models can correct their errors would predict that 99% time horizons are 20x lower compared to the exponential distribution where errors are constant, so we really need more data.
I mean that they useless to use as a hold out set, not that they are useless more generally, so good point here, so I will edit the post.
Specifically, that means the outcomes from the AI not seeing/reacting to the interpretability technique become the same as if the AI was allowed to see the data, because it had learned all the generalizable tricks.
Nice shout out to Knight Lee’s post.
Another reason is that since evals/interp techniques have a finite lifetime before they become ultimately useless to use as a hold out set, you should eventually train against the eval/interp techniques once it’s value as a hold out test set declines to near 0.
I got this point from Herbie Bradley.
But this only argues for training against interp at the end of training, rather than continuously, while your post argues we should use interp to train against it for all of training.
I remember one of the reasons the DoD had developed such an anti-China view was because back in the 2010s, China had tended to break trade agreements constantly, showing that it wasn’t a credible dealmaker and further cooperation was not worth it.
I wish I knew which lesswrong comment said this before.
I basically agree with the intended point that general intelligence in a compute-limited world is necessarily complicated (and think that a lot of people are way too invested in trying to simplify the brain into the complexity of physics), but I do think you are overselling the similarities between deep learning and the brain, and in particular you are underselling the challenge of actually updating the model, mostly because unlike current AIs, humans can update their weights at least once a day always, and in particular there’s no training date cutoff after which the model isn’t updated anymore, and in practice human weight updates almost certainly have to be done all the time without a training and test separation, whereas current AIs do update their weights, but it lasts only a couple of months in training and then the weights are frozen and served to customers.
(For those in the know, this is basically what people mean when they talk about continual learning).
So while there are real similarities, there are also differences.
Because most of the world is actually complication. This is another thing Alan Kay talks about — the complexity curve versus the complication curve. If you have physics brain, you model the world as being mostly fundamental complexity with low Kolmogorov complexity, and you expect some kind of hyperefficient Solomonoff induction procedure to work on it. But if you have biology brain or history brain, you realize that the complication curve of the outcomes implied by the rules of the cellular automaton that is our reality is vastly, vastly bigger than the fundamental underlying complexity of the basic rules of that automaton.
Another way to put this, if you’re skeptical: the actual program size of the universe is not just the standard model. It is the standard model plus the gigantic seed state after the Big Bang. If you think of it like that, you realize the size of this program is huge. And so it’s not surprising that the model you need to model it is huge, and that this model quickly becomes very difficult to interpret due to its complexity.
I would slightly change this, and say that if you can’t brute-force simulate the universe based on it’s fundamental laws, you must take into account the seed, but otherwise a very good point that is unheeded by a lot of people (the change doesn’t matter for AI capabilities in the next 50-100 years, and it also doesn’t matter for AI alignment with p(0.9999999), but does matter from a long-term perspective on the future/longtermism.
I actually disagree with this, and would say that if you believe AI alignment is hard and there isn’t a way to make superhuman AI safe without immense capabilities restraint, then data-center bans are net positive for the following reason:
Even under the assumption that new paradigms are required, training and experiment compute is still helpful because of scale-dependent algorithmic efficiency, which means that algorithmic progress requires training compute to increase, and it’s a significant portion of the algorithmic efficiency that we do get in practice, as Epoch notes below:
For example, @MITFutureTech found that shifting from LSTMs (green) to Modern Transformers (purple) has an efficiency gain that depends on the compute scale: - At 1e15 FLOP, the gain is 6.3× - At 3e16 FLOP, the gain is 26×
Naively extrapolating to 1e23 FLOP, the gain is 20,000×!
Also the AI Futures Model argues for a 4x slowdown (but this has to be appropriately timed, but even later pauses slow down takeoff).
This should probably be a recurring question, ala the Open Threads LW moderators make, but to put it in a short sentence, alignment has gotten easier, but humanity has gotten more incompetent and is unwilling to pay large costs for safety.
The reason I say alignment has gotten easier is that we have slowly started to realize that the original goal needed to be revised in part by lowering the capability target.
One of the insights of AI control is that we (probably) don’t actually need to consider aligning super-intelligences in the limit of technological development, or anywhere close to that, and that the first AIs that are both massively useful and pose non-negligible risk of AI takeover are able to be controlled in a way that doesn’t depend on AI alignment working.
To be clear, it’s still quite daunting as a challenge and AI companies/governments have started to be more reckless in AI deployment/progress, so it’s still easy for misalignment to occur, especially if we get more unfavorable paradigms (neuralese actually working would be the big one here, but even more prosaic continual learning/long-term memory could be a big problem for AI alignment)
My median/modal expectation conditional on AI being able to automate all of AI R&D is that we implement half-baked control/alignment, and things are very messy and lots of balls are dropped, but we ultimately survive the ordeal based on cheap strategies like satiating AI preferences working, but that we incur a terrifying amount of risk (as in for example taking on 1-5%, or even 10-90% risk of AI takeover) while attempting to solve AI alignment.
My current take is that the chatbot parasitism, even at it’s most severe was basically what was expected when you let the general population use a tech that can speak back to them, and I basically agree with Ben-Landau Taylor’s theory that the demand of horrifying stories around AI psychosis is way in excess of the true supply, and the biggest reason it was focused on was because of GPT-4o and the fact that we are generally bad at base rates, so I’m unconvinced persona parasitology actually matters that much.
AI existential risks, especially extinction risks from a long-termist perspective are now way overfunded compared to better futures work, and longtermism properly interpreted agrees with the common view amongst the general public that sub-existential catastrophes that collapse civilization are at least as important as risks that kill everybody, and are more important to prevent in practice than extinction risks.
One major upshot of this is that bio-threats, wars that can collapse civilization entirely, or other threats that kill off a large fraction of the population but don’t make them extinct, especially coming from AI is quite a bit more important to prevent than classical AI risk scenarios, and probably deserve more funding than current AI safety.
Related to this, the maxipok heuristic is a bad guide to action, because expected (and quite likely the actual distribution) distributions of futures are nowhere near as dichotomous as some people think, and because the probability of AGI this century is quite high, it’s quite likely that non-existential interventions persist.
A better heuristic is to instead focus on a wider portfolio of grand challenges, which were defined in the article as decisions that could affect the value of the future by at least 0.1%, and another better heuristic related to long-term alignment of ASI is to scrap the Coherent Extrapolated Volition target and instead make ASIs execute optimal moral trades.
The counting arguments for misalignment, even if they were correct do not show that AI safety is as difficult as some groups like MIRI claim without other very contestable premises that we could attempt to make false.
While I generally agree with you, I’m getting more worried that the caveat of “they’re not studying the latest and greatest frontier models” is particularly applicable here due to a Liu et al paper (2025) which does show that in some cases, RLVR can create capabilities out of whole cloth.
So while I do think 2025-era frontier models aren’t influenced much by RLVR, I do expect 2026 and especially 2027-era LLMs to be influenced by RLVR much more relative to today, on both capabilities and alignment.
I agree that induction on data does require an inductive bias without resorting to look-up tables, but I do claim that you were arguing that modern AI’s behavior was more determined by the data than the architecture relative to what Max H was saying.
The data, as Zack M Davis argues, and one of the takeaways from the deep learning revolution is that inductive biases mattered a lot less than we thought, and data is much more important for AI behavior than we thought.
And once you realize this, the entire scenario falls apart, and ultimately it’s anti-capitalist pablum that has it’s bottom line already written, as shown by Alex Armlovich here:
Bad econ. “White collar workers switch to Doordash, driving down real wages there too” is partial equilibrium thinking
Robot firms only grow if they’re producing more real goods & services. If production is growing, real incomes are rising not falling. No doom loop!
If production is growing but somehow consumption stalls, that’s just a failure of monetary & fiscal policy. Cut rates to zero quickly; below zero, fiscal policy kicks in to redistribute income to consumers & restore consumption growth to trend
Either there’s no doom loop in the first place, or else New Keynesian monetary & fiscal policy kicks in to close any emergent wedge between robot output & human consumption
This piece is ultimately just anticapitalist pablum (teasing a belief that most markets are just scams and rent seeking in the intro!)--and after that, the piece simply underrates or misunderstands the stabilizing powers of liberal democratic Keynesian capitalism to keep output & consumption in balance
We don’t have to allow what @delong calls “a failure of the exchange mechanism” in the future any more than we did in the 1930s
I disagree with this conclusion, actually, because I didn’t say that AI developers or AIs themselves would attempt to exterminate humanity, I only said that my analysis was compatible with that outcome, and so was more general than you thought.
In order to reach this conclusion, you also need opinions on how likely this is to happen.