I think I need more practice talking with people in real time (about intellectual topics). (I’ve gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I’m interested in (see my recent post/comment history to get a sense), please contact me via PM.
Wei Dai
The power of scaling is that with real unique data, however unoriginal, the logarithmic progress doesn’t falter, it still continues its logarithmic slog at an exponential expense rather than genuinely plateauing.
How to make sense of this? If the additional training data is mostly low quality (AI labs must have used the highest quality data first?) or repetitive (contains no new ideas/knowledge), perplexity might go down but what is the LLM really learning?
You realize that from my perspective, I can’t take this at face value due to “many apparent people could be non‑conscious entities”, right? (Sorry to potentially offend you, but it seems like too obvious an implication to pretend not to be aware of.) I personally am fairly content most of the time but do have memories of suffering. Assuming those memories are real, and your suffering is too, I’m still not sure that justifies calling the simulators “cruel”. The price may well be worth paying, if it potentially helps to avert some greater disaster in the base universe or other simulations, caused by insufficient philosophical understanding, moral blind spots, etc., and there is no better alternative.
If it’s possible at all for this process to lead somewhere good, then it’s possible for it to lead somewhere good within the mind of an AI that combines a human-like ability to reason, with human-like social and moral instincts / reflexes.
A counterexample to this is if humans and AIs both tend to conclude after a lot of reflection that they should be axiologically selfish but decision theoretically cooperative (with other strong agents), then if we hand off power to AIs, they’ll cooperate with each other (and any other powerful agents in the universe or multiverse) to serve their own collective values, but we humans will be screwed.
Another problem is that we’re relatively confident that at least some humans can reason “successfully”, in the sense of making philosophical progress, but we don’t know the same about AI. There seemingly are reasons to think it might be especially hard for AI to learn, and easy for AI to learn something undesirable instead, like optimizing for how persuasive their philosophical arguments are to (certain) humans.
Finally, I find your arguments against moral realism somewhat convincing, but I’m still pretty uncertain, and think the arguments I gave in Six Plausible Meta-Ethical Alternatives for the realism side of the spectrum still somewhat convincing as well, don’t want to bet the universe on or against any of these positions.
I should have given some examples of my own. Here’s Gemini on a story idea of mine, for the Star Wars universe (“I wish there was a story about a power-hungry villain who takes precautions against becoming irrational after gaining power. You’d think that at least some would learn from history. [...] The villain could anticipate that even his best efforts might fail, and create a mechanism to revive copies of himself from time to time, who would study his own past failures, rise to power again, and try to do better each time. [...] Sometimes the villain becomes the hero and the reader roots for him to succeed in stabilizing galactic society long term, but he fails despite his best efforts.”)
That’s a fantastic twist and adds immense depth and pathos to the entire concept! Having a cycle where the villain, armed with the knowledge of past tyrannical failures, genuinely attempts a more stable, perhaps even seemingly benevolent, form of authoritarianism – only to fail despite their best efforts – is incredibly compelling.
It’s assessment on this comment:
Yes, I think that reply is a very good and concise explanation of one key reason why wages might fall or stagnate despite rising productivity, viewed through a neoclassical lens. It effectively captures the concept of diminishing marginal value within the firm, even without external market price changes or new entrants.
It’s assessment on this comment:
This version is slightly stronger due to the increased specificity in the final sentence. It clearly articulates the key components of the argument and the commenter’s stance relative to the author’s potential views. It’s an excellent comment and ready to post.
Each of those comments did get upvoted to >10, so maybe Gemini is not too far off the mark, and I’m just not used to seeing “fantastic”, “very good”, “excellent” said to me explicitly?
Is Gemini 2.5 Pro really not sycophantic? Because I tend to get more positive feedback from it than any online or offline conversation with humans. (Alternatively, humans around me are too reluctant to give explicit praise?)
Why do you think they haven’t talked to us?
They might be worried that their own philosophical approach is wrong but too attractive once discovered, or creates a blind spot that makes it impossible to spot the actually correct approach. The division of western philosophy into analytical and continental traditions, who are mutually unable to appreciate each other’s work, seems to be an instance of this. They might think that letting other philosophical traditions independently run to their logical conclusions, and then conversing/debating, is one way to try to make real progress.
My sense is that most of the people with lots of power are not taking heroic responsibility for the world. I think that Amodei and Altman intend to achieve global power and influence but this is not the same as taking global responsibility. I think, especially for Altman, the desire for power comes first relative to responsibility. My (weak) impression is that Hassabis has less will-to-power than the others, and that Musk has historically been much closer to having responsibility be primary.
Can you expand on this? How can you tell the difference, and does it make much of a difference in the end (e.g., if most people get corrupted by power regardless of initial intentions)?
As a background model, I think if someone wants to take responsibility for some part of the world going well, by-default this does not look like “situating themselves in the center of legible power”.
And yet, Eliezer, the writer of “heroic responsibility” is also the original proponent of “build a Friendly AI to take over the world and make it safe”. If your position is that “heroic responsibility” is itself right, but Eliezer and others just misapplied it, that seems to imply we need some kind of post-mortem on what went wrong with trying to apply the concept, and how future people can avoid making the same mistake. My guess is that like other human biases, it’s hard to avoid making this mistake even if you point it out to people or try other ways to teach people to avoid it, because the drive for status and power is deep-seated, because it has a strong evolutionary logic.
(My position is, let’s not spread ideas/approaches that will predictably be “misused”, e.g., as justification for grabbing power, similar to how we shouldn’t develop AI that will predictably be “misused”, even if nominally “aligned” in some sense.)
Simulating civilizations won’t solve philosophy directly, but can be useful for doing so eventually by:
Giving us more ideas about how to solve philosophy, by seeing how other civilizations try to do it.
Point out potential blind spots / path dependencies in one’s current approach.
Directly solve certain problems (e.g., do all sufficiently advanced civilizations converge to objective values or the same decision theory or notion of rationality).
Yeah, that seems a reasonable way to look at it. “Heroic responsibility” could be viewed as a kind of “unhobbling via prompt engineering”, perhaps.
At the outermost feedback loop, capabilities can ultimately be grounded via relatively easy objective measures such as revenue from AI, or later, global chip and electricity production, but alignment can only be evaluated via potentially faulty human judgement. Also, as mentioned in the post, the capabilities trajectory is much harder to permanently derail because unlike alignment, one can always recover from failure and try again. I think this means there’s an irreducible logical risk (i.e., the possibility that this statement is true as a matter of fact about logic/math) that capabilities research is just inherently easier to automate than alignment research, that no amount of “work hard to automated alignment research” can lower beyond. Given the lack of established consensus ways of estimating and dealing with such risk, it’s inevitable that the people with the least estimate/concern about this risk (and other AI risks) will push capabilities forward as fast as they can, and seemingly the only way to solve this on the societal level is to push for norms/laws against doing that, i.e., slow down capabilities research via (politically legitimate) force and/or social pressure. I suspect the author might already agree with all this (the existence of this logical risk, the social dynamics, the conclusion about norms/laws being needed to reduce AI risk beyond some threshold), but I think it should be emphasized more in a post like this.
Since bad people won’t heed your warning it doesn’t seem in good people’s interests to heed it either.
I’m not trying to “warn bad people”. I think we have existing (even if imperfect) solutions to the problem of destructive values and biased beliefs, which “heroic responsibility” actively damages, so we should stop spreading that idea or even argue against it. See my reply to Ryan, which is also relevant here.
If humans can’t easily overcome their biases or avoid having destructive values/beliefs, then it would make sense to limit the damage through norms and institutions (things like informed consent, boards, separation of powers and responsibilities between branches of government). Heroic responsibility seems antithetical to group-level solutions, because it implies that one should ignore norms like “respect the decisions of boards/judges” if needed to “get the job done”, and reduces social pressure to follow such norms (by giving up the moral high ground from which one could criticize such norm violations).
You’re suggesting a very different approach, of patching heroic responsibility with anti-unilateralist curse type intuitions (on the individual level) but that’s still untried and seemingly quite risky / possibly unworkable. Until we have reason to believe that the new solution is an improvement to the existing ones, it still seems irresponsible to spread an idea that damages the existing solutions.
Reassessing heroic responsibility, in light of subsequent events.
I think @cousin_it made a good point “if many people adopt heroic responsibility to their own values, then a handful of people with destructive values might screw up everyone else, because destroying is easier than helping people” and I would generalize it to people with biased beliefs (which is often downstream of a kind of value difference, i.e., selfish genes).
It seems to me that “heroic responsibility” (or something equivalent but not causally downstream of Eliezer’s writings) is contributing to the current situation, of multiple labs racing for ASI and essentially forcing the AI transition on humanity without consent or political legitimacy, each thinking or saying that they’re justified because they’re trying to save the world. It also seemingly justifies or obligates Sam Altman to fight back when the OpenAI board tried to fire him, if he believed the board was interfering with his mission.
Perhaps “heroic responsibility” makes more sense if overcoming bias was easy, but in a world where it’s actually hard and/or few people are actually motivated to do it, which we seem to live in, spreading the idea of “heroic responsibility” seems, well, irresponsible.
But as you suggested in the post, the apparently vast amount of suffering isn’t necessarily real? “most cosmic details and human history are probably fake, and many apparent people could be non‑conscious entities”
(However I take the point that doing such simulations can be risky or problematic, e g. if one’s current ideas about consciousness is wrong, or if doing philosophy correctly requires having experienced real suffering.)
My alternative hypothesis is that we’re being simulated by a civilization trying to solve philosophy, because they want to see how other civilizations might approach the problem of solving philosophy.
Did anyone predict that we’d see major AI companies not infrequently releasing blatantly misaligned AIs (like Sidney, Claude 3.7, o3)?
Just four days later, X blew up with talk of how GPT-4o has become sickeningly sycophantic in recent days, followed by an admission from Sam Altman that something went wrong (with lots of hilarious examples in replies):
the last couple of GPT-4o updates have made the personality too sycophant-y and annoying (even though there are some very good parts of it), and we are working on fixes asap, some today and some this week.
at some point will share our learnings from this, it’s been interesting.
I initially tried to use Gemini 2.5 Pro to write the whole explanation, but it kept making one mistake after another in its economics reasoning. Each rewrite would contain a new mistake after I pointed out the last one, or it would introduce a new mistake when I asked for some other kind of change. After pointing out 8 mistakes like this, I finally gave up and wrote it myself. I also tried Grok 3 and Claude 3.7 Sonnet but gave up more quickly on them after the initial responses didn’t look promising. However AI still helped a bit by reminding me of the right concepts/vocabulary.
Thought it would be worth noting this, as it seems a bit surprising. (Supposedly “phd-level” AI failing badly on an Econ 101 problem.) Here is the full transcript in case anyone is curious. Digging into this a bit myself, it appears that the “phd-level” claim is based on performance on GPQA, which includes Physics, Chemistry, and Biology, but not Economics.
In a competitive market, companies pay wages equal to Value of Marginal Product of Labor (VMPL) = P * MPL (Price of marginal output * Marginal Product per hour). (In programming, each output is like a new feature or bug fix, which don’t have prices attached, so P here is actually more like the perceived/estimated value (impact on company revenue or cost) of the output.)
When AI increases MPL, it can paradoxically decrease VMPL by decreasing P more, even if there are no new entrants in the programming labor market. This is because each company has a limited (especially in the short run) set of high value potential programming work to be done, which can be quickly exhausted by the enhanced programming productivity, leaving only low-value marginal work.
A more detailed explanation expanded by AI from the above, in case it’s too terse to understand.
Let’s unpack why programmer wages might stagnate or even fall when tools like AI dramatically increase individual productivity, even if the number of programmers available hasn’t changed. The core idea lies in how wages are determined in economic theory and how hyper-productivity can affect the value of the work being done at the margin.
1. How Are Wages Typically Determined? The VMPL = Wage Rule
In standard economic models of competitive labor markets, a company will hire workers (or more specifically, worker-hours) up to the point where the value generated by the last worker hired (the “marginal” worker) is just equal to the wage that worker must be paid. This value is called the Value of Marginal Product of Labor (VMPL).
VMPL = P * MPL
Let’s break down those components:
MPL (Marginal Product of Labor): This is the physical increase in output generated by adding one more unit of labor (e.g., one more hour of programming work). When you use AI assistance, you can produce more code, fix bugs faster, or complete features in less time. So, AI unambiguously increases your MPL. You are physically more productive per hour.
P (Price or Value of Output): This is the value the company gets from the specific output produced by that marginal hour of labor.
For factory workers making identical widgets, ‘P’ is simply the market price of one widget.
For programmers, it’s more complex. Features and bug fixes don’t usually have individual price tags. Instead, ‘P’ here represents the estimated value or business impact that the company assigns to the work done in that hour. This could be its impact on revenue, user retention, cost savings, strategic goals, etc. Crucially, different tasks have different perceived values (‘P’). Fixing a critical production bug has a much higher ‘P’ than tweaking a minor UI element.
So, the rule is: Wage = VMPL = (Value of Marginal Output) * (Marginal Output per Hour).
2. How AI Creates a Potential Paradox: Increasing MPL Can Decrease Marginal ‘P’
AI clearly boosts the MPL part of the equation. You get more done per hour. Naively, one might think this directly increases VMPL and thus wages. However, AI’s impact on the marginal ‘P’ is the key to the paradox.
Here’s the mechanism, focusing solely on the existing workforce (no new programmers):
Finite High-Value Work: Every company, at any given time, has a limited set of programming tasks it considers high-priority and high-value. Think of core features, major architectural improvements, critical bug fixes. There’s also a long list of lower-priority tasks: minor enhancements, refactoring less critical code, documentation updates, exploring speculative ideas.
Productivity Exhausts High-Value Tasks Faster: When programmers become 2x or 3x more productive thanks to AI, they can complete the company’s high-value task list much more quickly than before.
The Need to Fill Hours: Assuming the company still employs the same number of programmers for the same number of hours (perhaps due to contracts, wanting to retain talent, or needing ongoing maintenance), what happens once the high-value backlog is depleted faster? Managers need to assign some work to fill those paid hours.
Assigning Lower-Value Marginal Work: The work assigned during those hours will increasingly be drawn from the lower end of the priority list. The last few hours the company decides to pay for across its programming staff (the marginal hours) will be dedicated to tasks with significantly lower perceived business value (‘P’).
The Marginal ‘P’ Falls: The very productivity enhancement provided by AI leads directly to a situation where the value (‘P’) of the task performed during the marginal hour decreases significantly.
3. The Impact on VMPL and Wages
Now let’s look at the VMPL for that crucial marginal hour of programming work:
VMPL_marginal = P_marginal_task * MPL_boosted
Even though MPL_boosted is higher than before, if P_marginal_task has fallen proportionally more due to task saturation, the resulting VMPL_marginal can actually be lower than the VMPL_marginal before the AI productivity boost.
Example:
Before AI: Marginal hour MPL = 1 unit of work, Value P = $100/unit → VMPL = $100. Wage = $100.
After AI: Productivity doubles, MPL = 2 units/hour. But all $100-value tasks are done quickly. The marginal hour is now spent on a task valued at P = $40/unit.
New VMPL_marginal: $40/unit * 2 units/hour = $80.
Result: The maximum wage the company is willing to pay for that marginal hour (and thus, the market wage for all similar hours) could fall to $80, even though individual programmers are producing more output per hour.
Conclusion:
This explanation shows how a massive increase in individual productivity (MPL) can, somewhat paradoxically, lead to stagnant or falling wages even without any change in the number of available programmers. The mechanism is the saturation of high-value work. Because companies hire based on the value generated at the margin, and high productivity allows that margin to quickly shift to lower-value tasks, the perceived value (‘P’) of that marginal work can decrease significantly, potentially offsetting or even overwhelming the gains in physical productivity (MPL) when calculating the VMPL that determines wages.
As I wrote in Social status part 1/2: negotiations over object-level preferences, there’s a zero-sum nature of social leading vs following. If I want to talk about trains and Zoe wants to talk about dinosaurs, we can’t both get everything we want; one of us is going to have our desires frustrated, at least to some extent.
Why does this happen in the first place, instead of people just wanting to talk about the same things all the time, in order to max out social rewards? Where does interest in trains and dinosaurs even come from? They seem to be purely or mostly social, given lack of practical utility, but then why divergence in interests? (Understood that you don’t have a complete understanding yet, so I’m just flagging this as a potential puzzle, not demanding an immediate answer.)
There is in fact a precedent for that—indeed, it’s the status quo! We don’t know what the next generation of humans will choose to do, but we’re nevertheless generally happy to entrust the future to them.
I’m only “happy” to “entrust the future to the next generation of humans” if I know that they can’t (i.e., don’t have the technology to) do something irreversible, in the sense of foreclosing a large space of potential positive outcomes, like locking in their values, or damaging the biosphere beyond repair. In other words, up to now, any mistakes that a past generation of humans made could be fixed by a subsequent generation, and this is crucial for why we’re still in an arguably ok position. However AI will make it false very quickly by advancing technology.
So I really want the AI transition to be an opportunity for improving the basic dynamic of “the next generation of humans will [figure out new things and invent new technologies], causing self-generated distribution shifts, and ending up going in unpredictable directions”, for example by improving our civilizational philosophical competency (which may allow distributional shifts to be handled in a more principled way), and not just say that it’s always been this way, so it’s fine to continue.
I’m going to read the rest of that post and your other posts to understand your overall position better, but at least in this section, you come off as being a bit too optimistic or nonchalant from my perspective...
Doesn’t this seem like a key flaw in the usual scaling laws? Why haven’t I seen this discussed more? The OP did mention declining average data quality but didn’t emphasize it much. This 2023 post trying to forecast AI timeline based on scaling laws did not mention the issue at all, and I received no response when I made this point in its comments section.
I guess this is related to the fact that LLMs are very data inefficient relative to humans, which implies that a LLM needs to be trained on each idea/knowledge multiple times in multiple forms before it “learns” it. It’s still hard for me to understand this on an intuitive level, but I guess if we did understand it, the problem of data inefficiency would be close to being solved, and we’d be much closer to AGI.