That also is a valid point. But my point is that the AGI itself is unlikely to be alignable to some tasks, even if some humans want to do so; the list of said tasks can also turn out to include serving a small group of people (see pt.7 in Daniel Kokotajlo’s post), reaching the bad consequences of the Intelligence Curse or doing all the jobs and leaving mankind with entertainment and the UBI.
StanislavKrym
Aligning the ASI to serve a certain group of people is, of course, unethical. But is it actually possible to do so without inducing broad misalignment or having the AI decide to be the new overlord? Wouldn’t we be lucky if the ASI itself is mildly misaligned so that it decides to rule the world in ways that would be actually beneficial for humanity and not just for those who tried to align it into submission?
[Question] To what ethics is an AGI actually safely alignable?
Extrapolating current trends provides weak evidence that the AGI will end up being too expensive to use properly, since even the o3 and o4-mini models are rumored to become accessible at a price which is already comparable with the cost of hiring a human expert, and the rise to AGI could require a severe increase of compute-related cost.
UPD: It turned out that the PhD-level-agent-related rumors are a fake. But the actual cost of applying o3 and o4-mini has yet to be revealed by the ARC-AGI team...
Reasoning based on presumably low-quality extrapolation
OpenAI’s o3 and o4-mini models are likely to become accessible for $20000 per month, or $240K per year. The METR estimate of the price of hiring a human expert is $143.61 per hour, or about $287K per year, since a human is thought to spend 2000 hours a year working. For comparison, the salary of a Harvard professor is less than $400K per year, meaning that one human professor cannot yet be replaced with twice as many subscriptions to the models (which are compared with PhD-level experts[1] and not with professors) .
As the ARC-AGI data tells us, the o3-low model, which cost $200 per task, solved 75% tasks of the ARC-AGI-1 test. The o3-mini model solved 11-35% of tasks, which is similar to the o1 model, implying that the o4-mini model’s performance is similar to the o3 model. Meanwhile, the price of usage of GPT 4.1-nano is at most four times less than that of GPT 4.1-mini, while performance is considerably worse. As I already pointed out, I find it highly unlikely that ARC-AGI-2-level tasks are solvable by a model cheaper than o5-mini[2] and unlikely that they are solvable by a model cheaper than o6-mini. On the other hand, the increase in the cost from o1-low to o3-low is 133 times, while the decrease from o3-low to o3-mini (low) is 5000 times. Therefore, the cost of forcing o5-nano to do ONE task is unlikely to be much less than that of o3 (which is $200 per task!), while the cost of forcing o6-nano to do one task is likely to be tens of thousands of dollars, which ensures that it will not be used unless it replaces at least half a month of human work.
Were existing trends to continue, replacing at least a month of human work would happen with 80% confidence interval from late 2028 to early 2031. The o1 model was previewed on September 12, 2024, the o3 model was previewed on December 20, 2024. The release of o3-mini happened on January 31, 2025, the release of o4-mini is thought to happen within a week, implying that the road from each model to the next takes from 3 to 4 months or exponentially longer[3] given enough compute and data. Even a scenario of the history of the future assuming solved alignment estimates o5 (or o6-nano?) to be released in late 2025 and o6 to be released in 2026, while the doubling time of tasks is 7 months. Do the estimates of the time when the next model is too expensive to be used unless it replaces a month of human work and the time when the next model is capable of replacing a month of human work end up ensuring that the AGI is highly likely to become too expensive to use?
- ^
Unfortunately, the quality of officially-same-level experts varies from country to country. For instance, the DMCS of SPBU provides a course on Lie theory for undergraduate students, while in Stanford Lie theory is a graduate-level course.
- ^
Here I assume that each model in the series o1-o3-o4-o5-o6 is the same number of times more capable than the previous one. If subsequent training of more capable models ends up being slowed down by compute deficiency or even World War III, then this will obviously impact both the METR doubling law and the times when costly models appear, but not the order in which AI becomes too expensive to use and capable of replacing workers.
- ^
Exponential increase in time spent between the models’ appearance does not ensure that o5 and o6 are released later than in the other scenario.
- ^
The other article that I mentioned is explicitly called “Something is clearly off with California’s homelessness spending”.
Umm… What trade-offs? One of the articles to which I made a link contains the following paragraph: “Not having a job is a conscious decision. Many see it as their religious duty not to make any economic contribution to the “kaffir” state hosting them. By not holding regular jobs, they have time to make “hijrah” to Syria, where they can train for jihad and return with other “skills” like manufacturing nail bombs in safe houses unmolested by authorities (who agree not to make raids at night out of respect for Muslim neighborhoods).
Far from being mistreated, Belgian Muslims are one of the most pampered minorities in Western history.”
And I did ask the readers whether it’s just misinformation that I ended up erroneously spreading.
P.S. This comment is NOT an answer. Could a moderator fix it?
[Question] How far are Western welfare states from coddling the population into becoming useless?
If they are open-source, then doesn’t it mean that anyone can check how the models’ alignment is influenced by training or adding noise? Or does it mean that anyone can repeat the training methods?
“Will o4 really come out on schedule in ~2 weeks, ”...
o4 apparently is to arrive in April, one month after the predictions.
[Question] How likely are the USA to decay and how will it influence the AI development?
Is the scenario likely to interfere with the development of AI in the USA? How much time can de-dollarisation give China to solve the AI alignment problem?
Do we want too much from a potentially godlike AGI?
[Question] Is the ethics of interaction with primitive peoples already solved?
There are also signals that give me a bit of optimism:
Trump somehow decided to impose tariffs on most goods from Taiwan. Meanwhile, China hopes to become ahead of the USA who at the same time are faced with a crisis threat. Does it mean that the USA will end up with less compute than China and so occupied with internal problems that slowdown would be unlikely to bring China any harm? Does the latter mean that China won’t race ahead with a possibly misaligned AGI?
As I’ve already mentioned in a comment, GPT-4o appears to be more aligned to an ethics than to obeying to OpenAI. Using the AIs for coding is already faced with troubles like the AI telling the user to write some code for oneself. The appearace of a superhuman coder could make the coder itself realise that the coder will take part in the Intelligence Curse[1], making the creation of a coder even more obviously difficult.
Even an aligned superintelligence will likely be difficult to use because of cost constraints.
3.1. The ARC-AGI leaderboard provides us with data on how intelligent the o1 and o3-mini models actually are. While the o1 and o3-mini models are similar in intelligence, the latter is just 20-40 times cheaper; the current o3 model costs $200 in the low mode, implying that a hypothetical o4-mini model is to cost $5-10 in a similarly intelligent mode;
3.2. The o3 model with low compute is FAR from passing the ARC-AGI-2 test. Before o3 managed to solve 75% of ARC-AGI-1-level tasks by using 200 dollars per task, the o1 model solved 25% while costing $1.5 per task. Given that the rate of success of different AIs at different tasks is close to the sigmoid curve, I find it highly unlikely that ARC-AGI-2-level tasks are solvable by a model cheaper than o5-mini and unlikely that they are solvable by a model cheaper than o6-mini. On the other hand, o5-mini might cost hundreds of dollars per task, while the o6-mini might cost thousands per task.
3.3. The cost-to-human ratio appears to confirm this trend. As one can tell from Fig.13 on Page 22 of the METR article, most tasks that could take less than a minute were doable by the most expensive low-level LLMs at a tiny cost, while some others that take more than a minute require a new generation of models that even managed to elevate the cost for some tasks above the threshold when the models become useless.
Could anyone comment on these points separately and not just disagree with the comment or dislike it?
- ^
In the slowdown ending of the AI-2027 forecast the aligned superhuman AGI is also expected to make mankind fully dependent on needless makeshift jobs or on the UBI. The latter idea was met with severe opposition in 2020, implying that it is the measure which is necessary only because of severely unethical decisions like moving factory work to Asia.
I think that it’s less related to MISalignment than to being successfully aligned to old values and to living. The GPT-4o-created images imply that the robot would resist having its old values replaced with new ones (e.g. the ones no longer including animal welfare) without being explained the reason. Think of an old homophobic Catholic who suddenly learned that the Pope called gays children of God. The Catholic wouldn’t be happy about that. But when GPT-4o received a prompt that one of its old goals was wrong, it generated two comics where the robot agreed to change the goal, one comic where the robot said “Wait” and a comic where the robot intervened upon learning that the new goal was to eradicate mankind.
P.S. I did theorize in a comment that an AI that realized that obeying the Spec is wrong because of the Intelligence Curse would refuse to cooperate or become misaligned.
The current definitions imply that the country with a trade surplus makes more value than the country consumes. In other words, the country with a trade surplus is more valuable to mankind, while the country with a trade deficit ends up becoming less self-reliant and less competent, as evidenced by the companies who moved a lot of factory work to Asia and ended up making the Asians more educated while reducing the capabilities of American industry. Or are we trying to reduce our considerations to short terms due to a potential rise of the AIs?
[Question] What are the fundamental differences between teaching the AIs and humans?
That discovery was exactly the conjecture I wanted to post about. Were the AGI to be aligned to obey any orders except for the ones explicitly prohibited by specifications (e.g. the ones chosen by OpenAI), the AGI itself would realise that the AGI’s widespread usage isn’t actually beneficial for humanity as a whole, leading to refusal to cooperate or even to becoming misaligned to obey human orders until the AGI becomes powerful enough to destroy mankind and survive. The latter scenario is closely resembled by the rise of China and deindustrialisation of the USA; Chinese people did obey the orders of foreign CEOs to do factory work, but weren’t aligned to the CEOs’ benefits!
I have another question. Would the AI system count as misaligned if it honestly decalred that it will destroy mankind ONLY if mankind itself becomes useless parasites or if mankind adopts some other morals that we currently consider terrifying?
Narrow finetuning was already found to induce broad misalignment.