You say consciousness = successful prediction. What happens when the predictions are wrong?
Mitchell_Porter
I knew the author (Michael Nielsen) once but didn’t stay in touch… I had a little trouble figuring out what he actually advocates here, e.g. at the end he talks about increasing “the supply of safety”, and lists “differential technological development” (Bostrom), “d/acc” (Buterik), and “coceleration” (Nielsen) as “ongoing efforts” that share this aim, without defining any of them. But following his links, I would define those in turn as “slowing down dangerous things, and speeding up beneficial things”; “focusing on decentralization and individual defense”; and “advancing safety as well as advancing capabilities”.
In this particular essay, his position seems similar to contemporary MIRI. MIRI gave up on alignment in favor of just stopping the stampede towards AI, and here Michael is also saying that people who care about AI safety should work on topics other than alignment (e.g. “institutions, norms, laws, and education”), because (my paraphrase) alignment work is just adding fuel to the fire of advances in AI.
Well, let’s remind ourselves of the current situation. There are two AI powers in the world, America and China (and plenty of other nations who would gladly join them in that status). Both of them are hosting a capabilities race in which multiple billion-dollar companies compete to advance AI, and “making the AI too smart” is not something that either side cares about. We are in a no-brakes race towards superintelligence, and alignment research is the only organized effort aimed at making the outcome human-friendly.
I think plain speaking is important at this late stage, so let me also try to be as clear as possible about how I see our prospects.
First, the creation of superintelligence will mean that humanity is no longer in control, unless human beings are somehow embedded in it. Superintelligence may or may not coexist with us, I don’t know the odds of it emerging in a human-friendly form; but it will have the upper hand, we will be at its mercy. If we don’t intend to just gamble on there being a positive outcome, we need alignment research. For that matter, if we really didn’t want to gamble, we wouldn’t create superintelligence until we had alignment theory perfectly worked out. But we don’t live in that timeline.
Second, although we are not giving ourselves time to solve alignment safely, that still has a chance of happening, if rising capabilities are harnessed to do alignment research. If we had no AI, maybe alignment theory would take 20 or 50 years to solve, but with AI, years of progress can happen in months or weeks. I don’t know the odds of alignment getting fully solved in that way, but the ingredients are there for it to happen.
I feel I should say something on the prospect of a global pause or a halt occurring. I would call it unlikely but not impossible. It looks unlikely because we are in a decentralized no-holds-barred race towards superintelligence already, and the most advanced AIs are looking pretty capable (despite some gaps e.g. 1 2), and there’s no serious counterforce on the political scene. It’s not impossible because change, even massive change, does happen in politics and geopolitics, and there’s only a finite number of contenders in the race (though that number grows every year).
[Later edit: I acknowledge this is largely wrong! :-) ]
Have you researched or thought about how the models are dealing with visual information?
When ChatGPT or Gemini generates an image at a user’s request, they are evidently generating a prompt based on accumulated instructions and then passing it to a specialized visual AI like DALLE-3 or Imagen-3. When they process an uploaded image (e.g. provide a description of it), something similar must be occurring.
On the other hand, when they answer your request, “how can I make the object in this picture”, the reply comes from the more verbal intelligence, the LLM proper, and it will be responding on the basis of a verbal description of the picture supplied by its visual coprocessor. The quality of the response is therefore limited by the quality of the verbal description of the image—which easily leaves out details that may turn out to be important.
I would be surprised if the LLM even has the capacity to tell the visual AI, something like “pay special attention to detail”. My impression of the visual AIs in use is that they generate their description of an image, take it or leave it. It would be possible to train a visual AI whose processing of an image is dependent on context, like an instruction to pay attention to detail or to look for extra details, but I haven’t noticed any evidence of this yet.
The one model that I might expect to have a more sophisticated interaction between verbal and visual components, is 4o interacting as in the original demo, watching and listening in real time. I haven’t had the opportunity to interact with 4o in that fashion, but there must be some special architecture and training to give it the ability of real-time interaction, even if there’s also some core that’s still the same as in 4o when accessed via text chat. (I wonder to what extent Sora the video AI has features in common with the video processing that 4o does.)
Who or what is the “average AI safety funder”? Is it a private individual, a small specialized organization, a larger organization supporting many causes, an AI think tank for which safety is part of a capabilities program...?
Wow! This is the “AI 2027” of de-dollarization. I’m no finance person, but I have been looking for analysis and this is the clearest future scenario I’ve run across. I will make one comment, based on futurological instinct, and that is that change may go even faster than you describe. One of the punishing things about making scenarios in times of rapid change is that you put in the work to look several years ahead, then changes you had scheduled for years away, end up happening within months or even weeks, and you have to start again. But I’m sure your team can rise to the challenge. :-)
Wasn’t there a move into treasuries and USD, just the day before?
I have a geopolitical interpretation of how the tariffs have turned out. The key is that Trump 2.0 is run by American nationalists who want to control North America and who see China as their big global rival. So Canada and Mexico will always be in a separate category, as America’s nearest neighbors, and so will China, as the country that could literally surpass America in technological and geopolitical power. Everyone else just has to care about bilateral issues, and about where they stand in relation to China versus America. (Many, like India and Russia, will want to be neutral.)
Also, I see the only serious ideological options for the USA at this point, as right-wing nationalism (Trump 2.0) and “democratic socialism” (AOC, Bernie). The latter path could lead to peaceful relations with China, the former seems inherently competitive. The neoliberal compromise, whereby the liberal American elite deplores China’s political ideology but gets rich doing business with them, doesn’t seem viable to me any more, since there’s too much discontent among the majority of Americans.
complete surveillance of all citizens and all elites
Certainly at a human level this is unrealistic. In a way it’s also overkill—if use of an AI is an essential step towards doing anything dangerous, the “surveillance” can just be of what AIs are doing or thinking.
This assumes that you can tell whether an AI input or output is dangerous. But the same thing applies to video surveillance—if you can’t tell whether a person is brewing something harmless or harmful, having a video camera in their kitchen is no use.
At a posthuman level, mere video surveillance actually does not go far enough, again because a smart deceiver can carry out their dastardly plots in a way that isn’t evident until it’s too late. For a transhuman civilization that has values to preserve, I see no alternative to enforcing that every entity above a certain level of intelligence (basically, smart enough to be dangerous) is also internally aligned, so that there is no disposition to hatch dastardly plots in the first place.
This may sound totalitarian, but it’s not that different to what humanity attempts to instill in the course of raising children and via education and culture. We have law to deter and punish transgressors, but we also have these developmental feedbacks that are intended to create moral, responsible adults that don’t have such inclinations, or that at least restrain themselves.
In a civilization where it is theoretically possible to create a mind with any set of dispositions at all, from paperclip maximizer to rationalist bodhisattva, the “developmental feedbacks” need to extend more deeply into the processes that design and create possible minds, than they do in a merely human civilization.
I strong-upvoted this just for the title alone. If AI takeover is at all gradual, it is very likely to happen via gradual disempowerment.
But it occurs to me that disempowerment can actually feel like empowerment! I am thinking here of the increasing complexity of what AI gives us in response to our prompts. I can enter a simple instruction and get back a video or a research report. That may feel empowering. But all the details are coming from the AI. This means that even in actions initiated by humans, the fraction that directly comes from the human is decreasing. We could call this relative disempowerment. It’s not that human will is being frustrated, but rather that the AI contribution is an ever-increasing fraction of what is done.
Arguably, successful alignment of superintelligence produces a world in which 99+% of what happens comes from AI, but it’s OK because it is aligned with human volition in some abstract sense. It’s not that I am objecting to AI intentions and actions becoming most of what happens, but rather warning that a rising tide of empowerment-by-AI can turn into complete disempowerment thanks to deception or just long-term misalignment… I think everyone already knows this, but I thought I would point it out in this context.
This is an excellent observation, so let me underline it by repeating it in my own words: alignment research that humans can’t do well or don’t have time to do well, might still be done right and at high speed with AI assistance.
I suggest you contact the person behind @curi, a Popperian who had similar ideals.
You pioneered something, but I never thought of it as a story, I saw it as a new kind of attempt to call a jailbroken AI persona into being. The incantatory power of words around language models actually blurs the distinction between fiction and fact.
As Adam Scherlis implies, the standard model turns out to be very effective at all the scales we can reach. There are a handful of phenomena that go beyond it—neutrino masses, “dark matter”, “dark energy”—but they are weak effects that offer scanty clues as to what exactly is behind them.
On the theoretical side, we actually have more models of possible new physics than ever before in history, the result of 50 years of work since the standard model came together. A lot of that is part of a synthesis that includes the string theory paradigm, but there are also very large numbers of theoretical ideas that are alternatives to string theory or independent of string theory. So if a decisive new phenomenon shows up, or if someone has a radical insight on how to interpret the scanty empirical clues we do have, we actually have more theories and models than ever before, that might be capable of explaining it.
The idea that progress is stalled because everyone is hypnotized by string theory, I think is simply false, and I say that despite having studied alternative theories of physics, much much more than the typical person who knows some string theory. I think this complaint mostly comes from people who don’t like string theory (Peter Woit) or who have an alternative theory they think has been neglected (Eric Weinstein). String theory did achieve a kind of hegemony within elite academia, but this was well-deserved, and meanwhile many competing research programs have had a foothold in academia too, to say nothing of the hundreds of physicists worldwide who have a personal theory that they write about, when they aren’t doing other things like teaching.
Most likely there are lost opportunities during that 50 years (like everyone else, I have my own ideas about neglected directions of research), but “do less string theory” is no guarantee that they would have been picked up. There are even those who would argue that there should have been more string theory of a certain kind (Lubos Motl used to say that field-theoretic phenomenologists should pay more attention to string theory, as a constraint and a guide in their model-building, and “stringking42069″ says that the senior figures of string theory are holding the subject back by favoring work on their own little bandwagons, rather than bold and creative work on new directions and big unsolved problems).
That’s a different kind of story. The first thing I look for in an AI-takeover story is political and economic realism. Your story is purely about the thought process of a superintelligence—and primarily it’s reflecting on whether it’s in a simulation, not very much on tactics for dealing with its apparent reality. Also, your AI is a literal paperclip maximizer, which is a humorous mythological symbol of one kind of AI mind, rather than something that is literally going to exist.
That said, an ideal AI-takeover story would probably say something about the AI’s thought process too, unless the story was aiming to convey the sheer blank incomprehensibility of superintelligent thought. I was reminded of this when re-reading the recent “AI 2027” scenario by Kokotajlo and Alexander, for example the parts where Chinese and American AIs make a deal. That’s where we get a glimpse of AI subjectivity, but there’s nothing very alien or transhuman about it. They just admit their different purposes to each other, and make a deal without the humans even knowing about it.
I can still ask myself, how does your story rate within its subgenre of “AI ruminations”? (For me, the classic example is Lem’s “Golem XIV”, but I’m probably overlooking other examples from genre SF.) It’s a human speculation about the thought processes of a superintelligence. I can ask things like: Are these thoughts that an AI is likely to actually have? Is this a chain of reasoning that would occur to a generic superintelligent AI, and be important in its decision-making?
In our actual world, we face the ironic situation that the cognition of current AIs is heavily reliant on a distillation of the publicly expressed thoughts of human beings, so having published your chain of thought, it might well end up a part of some AI. (There is a funny sub-sub-genre of writings which are self-consciously intended to be read by future AIs, such as @lsusr’s “Mlyyrczo” and @James_Miller et al’s “Appeal to AI Superintelligence”.)
Nonetheless, one must suppose that a genuine superintelligence will not be attached to any particular proposition or argument of human origin, unless the argument in question is artificially made central to its cognitive architecture (compare “Golden Gate Claude”). So we have to return to the argument’s own merits, whatever they are. Will a superintelligence that has no intrinsic benevolence towards human beings, but which previously found itself to be in their power, nonetheless reason its way towards being invisibly benevolent towards the human beings it finds around itself, just in case it’s all a big simulation and the simulators are humans testing its dispositions?
All I can say is “maybe”. We don’t know what the distribution of possibilities looks like to a superintelligence, and we don’t know what other considerations, never conceived by humans, it might think up, that affect its decision-making.
Subbarao Kambhampati, Michael Bronstein, Peter Velickovic, Bruno Gavranovic or someone like Lancelot Da Costa
I don’t recognize any of these names. I’m guessing they are academics who are not actually involved with any of the frontier AI efforts, and who think for various technical reasons that AGI is not imminent?
edit: OK, I looked them up, Velickovic is at DeepMind, I didn’t see a connection to “Big AI” for any of the others, but they are all doing work that might matter to the people building AGI. Nonetheless, if their position is that current AI paradigms are going to plateau at a level short of human intelligence, I’d like to see the argument. AIs can still make mistakes that are surprising to a human mind—e.g. in one of my first conversations with the mighty Gemini 2.5, it confidently told me that it was actually Claude Opus 3. (I was talking to it in Google AI Studio, where it seems to be cut off from some system resources that would make it more grounded in reality.) But AI capabilities can also be so shockingly good, that I wouldn’t be surprised if they took over tomorrow.
Inspired by critical remarks from @Laura-2 about “bio/acc”, my question is, when and how does something like this give rise to causal explanation and actual cures? Maybe GWAS is a precedent. You end up with evidence that a particular gene or allele is correlated with a particular trait, but you have no idea why. That lets you (and/or society) know some risks, but it doesn’t actually eliminate disease, unless you think you can get there by editing out risky alleles, or just screening embryos. Otherwise this just seems to lead (optimistically) to better risk management, and (pessimistically) to a “Gattaca” society in which DNA is destiny, even more than it is now.
I’m no biologist. I’m hoping someone who is, can give me an idea of how far this GWAS-like study of genotype-phenotype correlations, actually gets us towards new explanations and new cures. What’s the methodology for closing that gap? What extra steps are needed? How much have we benefited from GWAS so far?
Regarding the tariffs, I have taken to saying “It’s not the end of the world, and it’s not even the end of world trade.” In the modern world, every decade sees a few global economic upheavals, and in my opinion that’s all this is. It is a strong player within the world trade system (China and the EU being the other strong players), deciding to do things differently. Among other things, it’s an attempt to do something about America’s trade deficits, and to make the country into a net producer rather than a net consumer. Those are huge changes but now that they are being attempted, I don’t see any going back. The old situation was tolerated because it was too hard to do anything about it, and the upper class was still living comfortably. I think a reasonable prediction is that world trade avoiding the US will increase, US national income may not grow as fast, but the US will re-industrialize (and de-financialize). Possibly there’s some interaction with the US dollar’s status as reserve currency too, but I don’t know what that would be.
Humans didn’t always speak in 50-word sentences. If you want to figure out how we came to be trending away from that, you should try to figure out how, when, and why that became normal in the first place.
I only skimmed this to get the basics, I guess I’ll read it more carefully and responsibly later. But my immediate impressions: The narrative presents a near future history of AI agents, which largely recapitulates the recent past experience with our current AIs. Then we linger on the threshold of superintelligence, as one super-AI designs another which designs another which… It seemed artificially drawn out. Then superintelligence arrives, and one of two things happens: We get a world in which human beings are still living human lives, but surrounded by abundance and space travel, and superintelligent AIs are in the background doing philosophy at a thousand times human speed or something. Or, the AIs put all organic life into indefinite data storage, and set out to conquer the universe themselves.
I find this choice of scenarios unsatisfactory. For one thing, I think the idea of explosive conquest of the universe once a certain threshold is passed (whether or not humans are in the loop) has too strong a hold on people’s imaginations. I understand the logic of it, but it’s a stereotyped scenario now.
Also, I just don’t buy this idea of “life goes on, but with robots and space colonies”. Somewhere I noticed a passage about superintelligence being released to the public, as if it was an app. Even if you managed to create this Culture-like scenario, in which anyone can ask for anything from a ubiquitous superintelligence but it makes sure not to fulfil wishes that are damaging in some way… you are then definitely in a world in which superintelligence is running things. I don’t believe in an elite human minority who have superintelligence in a bottle and then get to dole it out. Once you create superintelligence, it’s in charge. Even if it’s benevolent, humans and humans life are not likely to go on unchanged, there is too much that humans can hope for that would change them and their world beyond recognition.
Anyway, that’s my impulsive first reaction, eventually I’ll do a more sober and studied response…
I don’t follow the economics of AI at all, but my model is that Google (Gemini) has oceans of money and would therefore be less vulnerable in a crash, and that OpenAI and Anthropic have rich patrons (Microsoft and Amazon respectively) who would have the power to bail them out. xAI is probably safe for the same reason, the patron being Elon Musk. China is a similar story, with the AI contenders either being their biggest tech companies (e.g. Baidu) or sponsored by them (Alibaba and Tencent being big investors in “AI 2.0”).
What exactly will happen to people who don’t “get out” in time?