Started promisingly, but like everyone else, I don’t believe in the ten-year gap from AGI to ASI. If anything, we got a kind of AGI in 2022 (with ChatGPT), and we’ll get ASI by 2027, from something like your “cohort of Shannon instances”.
Mitchell_Porter
For my part, I have been wondering this week, what a constructive reply to this would be.
I think your proposed imperatives and experiments are quite good. I hope that they are noticed and thought about. I don’t think they are sufficient for correctly aligning a superintelligence, but they can be part of the process that gets us there.
That’s probably the most important thing for me to say. Anything else is just a disagreement about the nature of the world as it is now, and isn’t as important.
Perhaps he means something like what Keynes said here.
Your desire to do good and your specific proposals are valuable. But you seem to be a bit naive about power, human nature, and the difficulty of doing good even if you have power.
For example, you talk about freeing people under oppressive regimes. But every extant political system and major ideology, has some corresponding notion of the greater good, and what you are calling oppressive is supposed to protect that greater good, or to protect the system against encroaching rival systems with different values.
You mention China as oppressive and say Chinese citizens “can do [nothing] to cause meaningful improvement from my perspective”. So what is it when Chinese bring sanitation or electricity to a village, or when someone in the big cities invents a new technology or launches a new service? That’s Chinese people making life better for Chinese. Evidently your focus is on the one-party politics and the vulnerability of the individual to the all-seeing state. But even those have their rationales. The Leninist political system is meant to keep power in the hands of the representatives of the peasants and the workers. And the all-seeing state is just doing what you want your aligned superintelligence to do—using every means it has, to bring about the better world.
Similar defenses can be made of every western ideology, whether conservative or liberal, progressive or libertarian or reactionary. They all have a concept of the greater good, and they all sacrifice something for the sake of it. In every case, such an ideology may also empower individuals, or specific cliques and classes, to pursue their self-interest under the cover of the ideology. But all the world’s big regimes have some kind of democratic morality, as well as a persistent power elite.
Regarding a focus on suffering—the easiest way to abolish suffering is to abolish life. All the difficulties arise when you want everyone to have life, and freedom too, but without suffering. Your principles aren’t blind to this, e.g. number 3 (“spread empathy”) might be considered a way to preserve freedom while reducing the possibility of cruelty. But consider number 4, “respect diversity”. This can clash with your moral urgency. Give people freedom, and they may focus on their personal flourishing, rather than the suffering or oppressed somewhere else. Do you leave them to do their thing, so that the part of life’s diversity which they embody can flourish, or do you lean on them to take part in some larger movement?
I note that @daijin has already provided a different set of values which are rivals to your own. Perhaps someone could write the story of a transhuman world in which all the old politics has been abolished, and instead there’s a cold war between blocs that have embraced these two value systems!
The flip side of these complaints of mine, is that it’s also not a foregone conclusion that if some group manages to create superintelligence and actually knows what they’re doing—i.e. they can choose its values with confidence that those values will be maintained—that we’ll just have perpetual oppression worse than death. As I have argued, every serious political ideology has some notion of the greater good, that is part of the ruling elite’s culture. That elite may contain a mix of cynics, the morally exhausted and self-interested, the genuinely depraved, and those born to power, but it will also contain people who are fighting for an ideal, and new arrivals with bold ideas and a desire for change; and also those who genuinely see themselves as lovers of their country or their people or humanity, but who also have an enormously high opinion of themselves. The dream of the last kind of person is not some grim hellscape, it’s a utopia of genuine happiness where they are also worshipped as transhumanity’s greatest benefactor.
Another aspect of what I’m saying, is that you feel this pessimistic about the world, because you are alienated from all the factions who actually wield power. If you were part of one of those elite clubs that actually has a chance of winning the race to create superintelligence, you might have a more benign view of the prospect that they end up wielding supreme power.
I don’t have a detailed explanation, but the user is posting a series of assignment or exam questions. Some of them are about “abuse”. Gemini is providing an example of verbal abuse.
If I understand you correctly, you want to create an unprecedentedly efficient and coordinated network, made out of intelligent people with goodwill, that will solve humanity’s problems in theory and in practice?
These are my thoughts in response. I don’t claim to know that what I say here is the truth, but it’s a paradigm that makes sense to me.
Strategic global cooperation to stop AI is effectively impossible, and hoping to do it by turning all the world powers into western-style democracies first is really impossible. Any successful diplomacy will have to work with the existing realities of power within and among countries, but even then, I only see tactical successes at best. Even stopping AI within the West looks very unlikely. Nationalization is conceivable, but I think it would have to partly come as an initiative from a cartel of leading companies; there is neither the will nor the understanding in the non-tech world of politics to simply impose nationalization of AI on big tech.
For these reasons, I think the only hope of arriving at a human-friendly future by design rather than by accident, is to solve the scientific, philosophical, and design issues involved, in the creation of benevolent superhuman AI. Your idea to focus on the creation of “digital people” has a lot in common with this; more precisely, I would say that many of the questions that would have to be answered, in order to know what you’re doing when creating digital people, are also questions that have to be answered, in order to know how to create benevolent superhuman AI.
Still, in the end I expect that the pursuit of AI leads to superintelligence, and an adequately benevolent superintelligence would not necessarily be a person. It would, however, need to know what a person is, in a way that isn’t tied to humanity or even to biology, because it would be governing a world in which that “unprecedented diversity of minds” can exist.
Eliezer has argued that it is unrealistic to think that all the scientific, philosophical, and design issues can be solved in time. He also argues that in the absence of a truly effective global pause or ban, the almost inevitable upshot is a superintelligence that reorganizes the world in a way that is unfriendly to human beings, because human values are complex, and so human-friendliness requires a highly specific formulation of what human values are, and of the AI architecture that will preserve and extrapolate them.
The argument that the design issues can’t be resolved in time is strong. They involve a mix of perennial philosophical questions like the nature of the good, scientific questions like the nature of human cognition and consciousness, and avantgarde computer-science issues like the dynamics of superintelligent deep learning systems. One might reasonably expect it to take decades to resolve all these.
Perhaps the best reason for hope here, is the use of AI as a partner in solving the problems. Of course this is a common idea, e.g. “weak-to-strong generalization” would be a form of this. It is at least conceivable that the acceleration of discovery made possible by AI, could be used to solve all the issues pertaining to friendly superintelligence, in years or months, rather than requiring decades. But there is also a significant risk that some AI-empowered group will be getting things wrong, while thinking that they are getting it right. It is also likely that even if a way is found to walk the path to a successful outcome (however narrow that path may be), that all the way to the end, there will be rival factions who have different beliefs about what the correct path is.
As for the second proposition I have attributed to Eliezer—that if we don’t know what we’re doing when we cross the threshold to superintelligence, doom is almost inevitable—that’s less clear to me. Perhaps there are a few rough principles which, if followed, greatly increase the odds in favor of a world that has a human-friendly niche somewhere in it.
Who said biological immortality (do you mean a complete cure for ageing?) requires nanobots?
We know individual cell lines can go on indefinitely, the challenge is to have an intelligent multicellular organism that can too.
It’s the best plan I’ve seen in a while (not perfect, but has many good parts). The superalignment team at Anthropic should probably hire you.
Isn’t this just someone rich, spending money to make it look like the market thinks Trump will win?
Doom aside, do you expect AI to be smarter than humans? If so, do you nonetheless expect humans to still control the world?
“Successionism” is a valuable new word.
My apologies. I’m usually right when I guess that a post has been authored by AI, but it appears you really are a native speaker of one of the academic idioms that AIs have also mastered.
As for the essay itself, it involves an aspect of AI safety or AI policy that I have neglected, namely, the management of socially embedded AI systems. I have personally neglected this in favor of SF-flavored topics like “superalignment” because I regard the era in which AIs and humans have a coexistence in which humans still have the upper hand as a very temporary thing. Nonetheless, we are still in that era right now, and hopefully some of the people working within that frame, will read your essay and comment. I do agree that the public health paradigm seems like a reasonable source of ideas, for the reasons that you give.
- 10 Oct 2024 22:55 UTC; -2 points) 's comment on Embracing complexity when developing and evaluating AI responsibly by (
There’s a lot going on in this essay, but the big point would appear to be: to create advanced AI is to materialize an Unknown Unknown, and why on earth would you expect that to be something you can even understand, let alone something that is sympathetic to you or “aligned” with you?
Then I made a PDF of the article and fed it to Claude Opus and to Google’s Gemini-powered NotebookLM, and both AIs seemed to get the gist immediately, as well as understanding the article’s detailed structure. There is a deep irony in hearing NotebookLM’s pod-people restating the essay’s points in their own words, and agreeing that its warnings make sense.
(edit: looks like I spoke too soon and this essay is 100% pure, old-fashioned, home-grown human)
This appears to be yet another post that was mostly written by AI. Such posts are mostly ignored.
This may be an example of someone who is not a native English speaker, using AI to make their prose more idiomatic. But then we can’t tell how many of the ideas come from the AI as well.
If we are going to have such posts, it might be useful to have them contain an introductory note, that says something about the process whereby they were generated, e.g. “I wrote an outline in my native language, and then [specific AI] completed the essay in English”, or, “This essay was generated by the following prompt… this was the best of ten attempts”, and so on.
Is it too much to declare this the manifesto of a new philosophical school, Constantinism?
each one is annotated with how much utility we estimate it to have
How are these estimates obtained?
Let me first say what I think alignment (or “superalignment”) actually requires. This is under the assumption that humanity’s AI adventure issues in a superintelligence that dominates everything, and that the problem to be solved is how to make such an entity compatible with human existence and transhuman flourishing. If you think the future will always be a plurality of posthuman entities, including enhanced former humans, with none ever gaining an irrevocable upper hand (e.g. this seems to be one version of e/acc); or if you think the whole race towards AI is folly and needs to be stopped entirely; then you may have a different view.
I have long thought of a benevolent superintelligence as requiring three things: superintelligent problem-solving ability; the correct “value system” (or “decision procedure”, etc); and a correct ontology (and/or the ability to improve its ontology). The first two criteria would not be surprising, in the small world of AI safety that existed before the deep learning revolution. They fit a classic agent paradigm like the expected utility maximizer; alignment (or Friendliness, as we used to say), being a matter of identifying the right utility function.
The third criterion is a little unconventional, and my main motive for it even more so, in that I don’t believe the theories of consciousness and identity that would reduce everything to “computation”. I think they (consciousness and identity) are grounded in “Being” or “substance”, in a way that the virtual state machines of computation are not; that there really is a difference between a mind and a simulation of a mind, for example. This inclines me to think that quantum holism is part of the physics of mind, but that thinking of it just as physics is not enough, you need a richer ontology of which physics is only a formal description; but these are more like the best ideas I’ve had, than something I am absolutely sure is true. I am much more confident that purely computational theories of consciousness are radically incomplete, than as to what the correct alternative paradigm is.
The debate about whether the fashionable reductionist theory of the day is correct, is as old as science. What does AI add to the mix? On the one hand, there is the possibility that an AI with the “right” value system but the wrong ontology, might do something intended as benevolent, that misses the mark because it misidentifies something about personhood. (A simple example of this might be, that it “uploads” everyone to a better existence, but uploads aren’t actually conscious, they are just simulations.) On the other hand, one might also doubt the AI’s ability to discover that the ontology of mind, according to which uploads are conscious, is wrong, especially if the AI itself isn’t conscious. If it is superintelligent, it may be able to discover a mismatch between standard human concepts of mind, extrapolated in a standard way, and how reality actually works; but lacking consciousness itself, it might also lack some essential inner guidance on how the mismatch is to be corrected.
This is just one possible story about what we could call a philosophical error in the AI’s cognition and/or the design process that produced it. I think it’s an example of why Wei Dai regards metaphilosophy as an important issue for alignment. Metaphilosophy is the (mostly philosophical) study of philosophy, and includes questions like, what is philosophical thought, what characterizes correct philosophical thought, and, how do you implement correct philosophical thought in an AI? Metaphilosophical concerns go beyond my third criterion, of getting ontology of mind correct; philosophy could also have something to say about problem-solving and about correct values, and even about the entire three-part approach to alignment with which I began.
So perhaps I will revise my superalignment schema and say: a successful plan for superalignment needs to produce problem-solving superintelligence (since the superaligned AI is useless if it gets trampled by a smarter unaligned AI), a sufficiently correct “value system” (or decision procedure or utility function), and some model of metaphilosophical cognition (with particular attention to ontology of mind).
Your title begins “Whimsical Thoughts on an AI Notepad”, so I presume this was written with AI assistance. Please say something about your methods. Did you just supply a prompt? If so, what was it? If you did contribute more than just a prompt, what was your contribution?
Alexander Dugin speaks of “trumpo-futurism” and “dark accelerationism”.
Dugin is a kind of Zizek of Russian multipolar geopolitical thought. He’s always been good at quickly grasping new political situations and giving them his own philosophical sheen. In the past he has spoken apocalyptically of AI and transhumanism, considering them to be part of the threat to worldwide tradition coming from western liberalism. I can’t see him engaging in wishful thinking like “humans and AIs coexist as equals” or “AIs migrate to outer space leaving the Earth for humans”, so I will be interested to see what he says going forward. I greatly regret that his daughter (Daria Dugina) was assassinated, because she was taking a serious interest in the computer age’s ontology of personhood, but from a Neoplatonist perspective; who knows what she might have come up with.