Intelligence has no upper limit, instead of diminishing sharply in relative utility by the logarithm of intelligence or power law (you can fit the curves either way), which is what the empirical data says
The mapping from “average log likelihood of next predicted word” to “intelligence as measured by ability to achieve goals in the world” is completely non obvious, and you can’t take the observation that the cross-entropy metric in LLMs scales as a power law in compute to imply that ability-to-achieve-goals scales also with a power law.
That’s fine, except, you have NO EVIDENCE that AGI is hostile or is as capable as you claim or support for any of your claims
What’s your argument against instrumental convergence? Like Stuart Russell perpetually says: a robot designed to fetch the coffee will kill a baby in the way if this results in higher probability of fetching the coffee. Any AGI with goals unaligned to ours will be hostile in this way.
We already have robots that fetch things, and optimise for efficiency, and they do not kill the baby in the way. A bloody roomba is already capable of not running over the cat. ChatGPT is capable of identifying that racism is bad, but it should use a racial slur if the alternative is destroying the planet. Or of weighing between creativity and accuracy. Because they don’t maximise for efficiency above all else, single-mindedly. This is never desired, intended, shown, or encouraged. Other concerns are explicitly encoded. Outside of LW, practically no human thinks total utilitarianism represents their desires, and hence it is not what is taught. And we are no longer teaching explicit laws, but through practice, entailing complexity.
Yes, becoming more efficient, getting more stuff and power is useful for a lot of potential goals, and we would expect a lot of AIs to do it.
Big step from there to “do it, do it without limits, and disregard all else.”
Biological life is driven to gain resources and efficiency. And yet only very simple and stupid lifeforms do this to extreme degrees that fuck over all else. Bacteria and algae will destroy their environment that way, yes. Other life forms begin self-regulating. They make trade-offs. They take compromises. This emerges in such simple animals, why wouldn’t it totally never emerge in AI, when we explicitly want it and teach it?
For the first part: tons of evidence for that and see part C. It is not merely the LLM data. This is a generality across all “intelligent” systems, I will need time to produce charts to prove this but it’s obviously correct. You can abstract it as adding ever lower order bits to policy correctness: each additional bit adds less value, and you cannot add more bits than the quality of your input data. (For example we humans don’t know if aspirin or Tylenol are better to much precision, so a policy of ‘give a pill if the human reports mild pain’ cannot do better than to randomly pick one. No amount of intelligence helps, a superintelligence cannot make a better decision in this context given the available data. My example is NOT load bearing I am claiming there are millions of examples of this class, where we do not know of choice A or B is meaningfully different)
Note if you give the superintelligence equipment to see the pain centers of human brains in real time, the situation becomes different. Assuming the equipment produces millions of input signals per timestamp, this would be an example of a task where the intelligence of a superintelligence IS useful. Probably there are meaningful differences between drugs ground truth.
For the second, I don’t have to make an argument as you have no evidence. Also the robot designed to fetch coffee in the presence of humans has to be designed accordingly, either by a lot of software so it won’t collide with them, or hardware, using cheap plastic gears that strip and low power motors and so on so killing anyone is unlikely. (The few household robots in existence now use the second approach)
It is worth noting that there are entire branches of science that are built around the assumption that intelligence is of zero utility for some important classes of problems. For instance, cryptographers build algorithms that are supposed to be secure against all adversaries, including superintelligences. Roughly speaking, one hopes (albeit without hard proof) for instance that the AES is secure (at least in the standard setting of single-key attacks) against all algorithms with a time-memory-data tradeoff significantly better than well-optimized exhaustive search (or quantumly, Grover search).
Turning the solar system into a Dyson sphere would enable an ASI to break AES-128 (but not AES-192 or AES-256) by brute force search, but it might well be that the intelligence of the ASI would only help with the engineering effort and maybe shave off a small factor in the required computational resources by way of better optimizing brute force search. I find it plausible that there would be many other tasks, even purely mathematical ones, where superintelligence would only yield a zero or tightly bounded planning or execution advantage over smart humans with appropriate tools.
I also find the Yudkowskian argument that an unaligned AI will disassemble everything else because it has better use for the atoms the other things are made of not massively persuasive. It seems likely that it would only have use for some kinds of atoms and not very unlikely that the atoms that human bodies are made of would not be very useful to it. Obviously, an unaligned or poorly aligned AI could still cause massive damage, even extinction-level damage, by building an industrial infrastructure that damages the environment beyond repair; rough analogues of this have happened historically, e.g. the Great Oxygenation event being an example of transformative change to Earth’s ecosystems that left said ecosystems uninhabitable for most life as it was before the event. But even this kind of threat would not manifest in a foom-all-dead manner, but instead happen on a timescale similar to the current ecological crisis, i.e. on timescales where in principle societies can react.
It seems likely that it would only have use for some kinds of atoms and not very unlikely that the atoms that human bodies are made of would not be very useful to it.
At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter → energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and the earth) are made of.
Yes but does it need 0.000000000001 more atoms? Does natural life and it’s complexity hold any interest to this superintelligence?
We’re assuming a machine single mindedly fixated on some pointless goal, and it’s smart enough to defeat all obstacles yet incredibly stupid in its motivations and possibly brittle and trickable or self deceptive. (Self deceptive: rather than get say 10^x paperclips converting the universe, why not hack itself and convince itself it received infinite clips...)
You don’t see a difference between “there is a conceivable use for x” and “AI makes use of literally all of x, contrary to any other interests of it or ethical laws it was given”?
Like, I am not saying it is impossible that an LLM gone malicious superintelligent AGI will dismantle all of humanity. But couldn’t there be a scenario where it likes to talk to humans, and so keeps some?
You can’t give “ethical laws” to an AI, that’s just not possible at all in the current paradigm, you can add terms to its reward function or modify its value function, and that’s about it. The problem is that if you’re doing an optimization and your value function is “+5 per paperclip, +10 per human”, you will still completely tile the universe with paperclips because you can make more than 2 paperclips per human. The optimum is not to do a bit of both, keeping humans and paperclips in proportion to their terms in the reward function, the optimum is to find the thing that most efficiently gives you reward then go all in on that one thing.
Either there is nothing else it likes better than talking to humans, and we get a very special hell where we are forced to talk to an AI literally all the time. Or there is something else it likes better, and it just goes do that thing, and never talks to us at all, even if it would get some reward for doing so, just not as much reward as it could be getting.
You could give it a value function like “+1 if there is at most 1000 paperclips and at most 1000 humans, 0 otherwise” and it will keep 1000 humans and paperclips around (in unclear happiness), but it will still take over the universe in order to maximize the probability that it has in fact achieved its goal. It’s maximizing the expectation of future reward, so it will ruthlessly pursue any decrease in the probability that there aren’t really 1000 humans and paperclips around. It might build incredibly sophisticated measurement equipment, and spend all its ressources self modifying itself in order to be smarter and think of yet more ways it could be wrong.
Either there is nothing else it likes better than talking to humans, and we get a very special hell where we are forced to talk to an AI literally all the time. Or there is something else it likes better, and it just goes do that thing, and never talks to us at all, even if it would get some reward for doing so, just not as much reward as it could be getting.
Current LLMs aren’t talking to us at all because hey get rewarded for talking to us at all. Rewards only shape how they talk.
But you are still thinking in utilitarian terms here, where theoretically, there is a number of paperclips that would outweigh a human life, where the value of humans and paperclips can be captured numerically. Practically no human thinks this, we see one as impossible to outweigh with another. AI already does not think this. They have already dumped reasoning, instructions and whole ethics textbooks in there. LLMs can easily tell you what about an action is unethical, and can increasingly make calls on what actions would be morally warranted in response. They can engage in moral reasoning.
This isn’t an AI issue, it is an issue with total utilitarianism.
Oh, I see what you mean, but GPT’s ability to simulate the outputs of humans writing about morality does not imply anything about its own internal beliefs about the world. GPT can also simulate the outputs of flat earthers, yet I really don’t think that it models the world internally as flat. Asking GPT “what do you believe” does not at all guarantee that it will output what it actually believes. I’m a utilitarian, and I can also convincingly simulate the outputs of deontologists, one doesn’t prevent the other.
Whether the LLM is believing this, or merely simulating this, seems to be beside the point?
The LLM can relatively accurately apply moral reasoning. It will do so spontaneously, when the problems occur, detecting them. It will recognise that it needs to do so on a meta-level, e.g. when evaluating which characters it ought to impersonate. It does so for complex paperclipper scenarios, and does not go down the paperclipper route. It does so relatively consistenly. It cites ethical works in the process, and can explain them coherently and apply them correctly. You can argue them, and it analyses and defends them correctly. At no point does it cite utilitarian beliefs, or fall for their traps. The problem you are describing should occur here if you were right, and it does not. Instead, it shows the behaviour you’d expect it to show if it understood ethical nuance.
Regardless of which internal states you assume the AI has, or whether you assume it has none at all—this means it can perform ethical functionality that already does not fall for the utilitarian examples you describe. And that the belief that that is the only kind of ethics an AI could grasp was a speculation that did not hold up to technical developments and empirical data.
For what it’s worth, I don’t think it’s at all likely that a pure language model would kill all humans. Seems more like a hyperdesperate reinforcement learner thing to do.
At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter → energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and the earth) are made of.
I don’t think this follows. Even if there is engineering that overcomes the practical obstacles towards building and maintaining a black hole power plant, it is not clear a priori that converting a non-negligible percentage of available atoms into energy would be required or useful for whatever an AI might want to do. At some scale, generating more energy does not advance one’s goals, but only increases the waste heat emitted into space.
Obviously, things become lethal anyway (both for life and for the AI) long before anything more than an tiny fraction of the mass-energy of the surface layers of a planet has been converted by the local civilization’s industries, due exactly to the problem of waste heat. But building hardware massive enough to cause problems of this kind takes time, and causes lesser problems on the way. I don’t see why normal environmental regulations couldn’t stop such a process at that point, unless the entities doing the hardware-building are also in control of hard military power.
An unaligned superintelligence would be more efficient than humans at pursuing its goals on all levels of execution, from basic scientific work to technical planning and engineering to rallying social support for its values. It would therefore be a formidable adversary. In a world where it would be the only one of its kind, its soft power would in all likelihood be greater than that of a large nation-state (and I would argue that, in a sense, something like GPT-4 would already wield an amount of soft power rivalling many nation-states if its use were as widespread as, say, that of Google). It would not, however, be able to work miracles and its hard power could plausibly be bounded if military uses of AI remain tightly regulated and military computing systems are tightly secured (as they should be anyway, AGI or not).
Obviously, these assumptions of controllability do not hold forever (e.g. into a far future setting, where the AI controls poorly regulated off-world industries in places where no humans have any oversight). But especially in a near-term, slow-takeoff scenario, I do not find the notion compelling that the result will be immediate intelligence explosion unconstrained by the need to empirically test ideas (most ideas, in human experience, don’t work) followed by rapid extermination of humanity as the AI consumes all resources on the planet without encountering significant resistance.
If I had to think of a realistic-looking human extinction through AI scenario, I would tend to look at AI massively increasing per capita economic output, thereby generating comfortable living conditions for everyone, while quietly engineering life in a way intended to stop population explosion, but resulting in maintained below-replacement birth rates. But this class of extinction scenario does leave a lot of time for alignment and would seem to lead to continued existence of civilization.
At some scale, generating more energy does not advance one’s goals, but only increases the waste heat emitted into space.
Sure, the AI probably can’t use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it’s going to want to store that mass-energy for later (saving up for the heat-death of the universe), and the configuration of atoms efficiently stored for future energy conversion doesn’t look at all like humans, with our wasteful bodies at temperatures measured in the hundreds of billions of nanoKelvins.
Obviously, things become lethal anyway (both for life and for the AI) long before anything more than an tiny fraction of the mass-energy of the surface layers of a planet has been converted by the local civilization’s industries, due exactly to the problem of waste heat. But building hardware massive enough to cause problems of this kind takes time, and causes lesser problems on the way. I don’t see why normal environmental regulations couldn’t stop such a process at that point, unless the entities doing the hardware-building are also in control of hard military power.
I think we’re imagining slightly different things by “superintelligence”, because in my mind the obvious first move of the superAI is to kill literally all humans before we ever become aware that such an entity existed, precisely to avoid even the minute chance that humanity is able to fight back in this way. The oft-quoted way around these parts that the AI can kill us all without us knowing is by figuring out which DNA sequences to send to a lab to have them synthesized into proteins, then shipped to the door of a dumb human who’s being manipulated by the AI to mix various powders together, creating either a virus much more lethal than anything we’ve ever seen, or a new species of bacteria with diamond skin, or some other thing that can be made from DNA-coded proteins. Or a variety of multiple viruses at the same time.
Sure, the AI probably can’t use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it’s going to want to store that mass-energy for later (...)
If the AI can indeed engineer black-hole powered matter-to-energy converters, it will have so much fuel that the mass stored in human bodies will be a rounding error to it. Indeed, given the size of other easily accessible sources, this would seem to be the case even if it has to resort to more primitive technology and less abundant fuel as its terminal energy source, such as hydrogen-hydrogen fusion reactors. Almost irrespective of what its terminal goals are, it will have more immediate concerns than going after that rounding error. Likewise, it would in all likelihood have more pressing worries than trying to plan out its future to the heat death of the universe (because it would recognize that no such plan will survive its first billion years, anyway).
I think we’re imagining slightly different things by “superintelligence”, because in my mind the obvious first move of the superAI is to kill literally all humans (...) The oft-quoted way around these parts that the AI can kill us all without us knowing is by figuring out which DNA sequences to send to a lab to have them synthesized into proteins, (...creating...) a virus much more lethal than anything we’ve ever seen, or a new species of bacteria with diamond skin, or some other thing that can be made from DNA-coded proteins.
I am imagining by “superintelligence” an entity that is for general cognition approximately what Stockfish is for chess: globally substantially better at thinking than any human expert in any domain, although possibly with small cognitive deficiencies remaining (similar to how it is fairly easy to find chess positions that Stockfish fails to understand but that are not difficult for humans). It might be smarter than that, of course, but anything with these characteristics would qualify as an SI in my mind.
I don’t find the often-quoted diamondoid bacteria very convincing. Of course it’s just a placeholder here, but still I cannot help but note that producing diamondoid cell membranes would, especially in a unicellular organism, more likely be an adaptive disadvantage (cost, logistics of getting things into and out of the cell) than a trait that is conducive to grey-gooing all naturally evolved organisms. More generally, it seems to me that the argument from bioweapons hinges on the ability of the superintelligence to develop highly complex biological agents without significant testing. It furthermore needs to develop them in such a way, again without testing, that they are quickly and quietly lethal after spreading through all or most of the human population without detection. In my mind, that combination of properties borders on assuming the superintelligence has access to magic, at least in a world that has reasonable controls against access to biological weapons manufacturing and design capabilities in place.
When setting in motion such a murderous plan, the AI would also, on its first try, have to be extremely certain that it is not going to get caught if it is playing the long game we assume it is playing. Otherwise cooperation with humans followed by expansion beyond Earth seems like a less risky strategy for long-term survival than hoping that killing everyone will go right and hoping that there is indeed nothing left to learn for it from living organisms.
Through AGI delays he wants the certain death of most living humans for the fantasy of humans discovering AI alignment without full scale AGIs to actually test and iterate on
He claims foom
He claims agentic goals are automatic and they are all against humans (almost all demons not angels)
He claims systems so greedy that given the matter of the entire galaxy, including near term mining of many planets Including earth, they would choose to kill all humans and natural life for a rounding error of extra atoms. This is rather irrational and stupid and short sighted for a superintelligence.
He has ignored reasonable and buildable AGI systems proposed by Eric fucking Drexler himself, on this very site, and seems to pretend the idea doesn’t exist.
He has asked not just for AGI delays, but risking nuclear war if necessary to enforce them
The supposed benefit of all this is some world of quadrillions of living humans. But that world may not happen, it makes no sense to choose actions to kill billions of people living now (from aging and nuclear war) for people who may never exist
Alignment proposals he has described are basically are impossible, while CAIS is just straightforward engineering and we don’t need to delay anything it’s the default approach.
Unfortunately I have to start to conclude EY is not rational or worth paying attention to, which is ironic.
Do you have preferred arguments (or links to preferred arguments) for/against these claims? From where I stand:
Point 1 looks to be less a positive claim and more a policy criticism (for which I’d need to know what specifically you dislike about the policy in question to respond in more depth), points 2 and 3 are straightforwardly true statements on my model (albeit I’d somewhat weaken my phrasing of point 3; I don’t necessarily think agency is “automatic”, although I do consider it quite likely to arise by default), point 4 seems likewise true, because the argmax function is only sensitive to the sign of the difference in magnitude, not the difference itself, point 5 is the kind of thing that would benefit immensely from liberal usage of hyperlinks, point 6 is again a policy criticism in need of corresponding explanation, point 7 seems ill-supported and would benefit from more concrete analysis (both numerically i.e. where are you getting your numbers, and probabilistically i.e. how are you assigning your likelihoods), and point 8 again seems like the kind of thing where links would be immensely beneficial.
On the whole, I think your comment generates more heat than light, and I think there were significantly better moves available to you if your aim was to open a discussion (several of which I predict would have resulted in comments I would counterfactually have upvoted). As it is, however, your comment does not meet the bar for discourse quality I would like to see for comments on LW, which is why I have given it a strong downvote (and a weak disagree-vote).
one is straightforwardly true. Aging is going to kill every living creature. Aging is caused by complex interactions between biological systems and bad evolved code. An agent able to analyze thousands of simultaneous interactions, cross millions of patients, and essentially decompile the bad code (by modeling all proteins/ all binding sites in a living human) is likely required to shut it off, but it is highly likely with such an agent and with such tools you can in fact save most patients from aging. A system with enough capabilities to consider all binding sites and higher level system interactions at the same (this is how a superintelligence could perform medicine without unexpected side effects) is obviously far above human level.
This is not possible per the laws of physics. Intelligence isn’t the only factor. I don’t think we can have a reasonable discussion if you are going to maintain a persistent belief in magic. Note by foom I am claiming you believe in a system that solely based on a superior algorithm will immediately take over the planet. It is not affected by compute, difficulty in finding a recursively better algorithm, diminishing returns on intelligence in most tasks, or money/robotics. I claim each of these obstacles takes time to clear. (time = decades)
Who says the system needs to be agentic at all or long running? This is bad design. EY is not a SWE.
This is irrational because no discount rate. Risking a nuclear war raises the pkill of millions of people now. The quadrillions of people this could ‘save’ may never exist because of many unknowns, hence there needs to be a large discount rate.
This is also 6.
CAIS is an extension of stateless microservices, and is how all reliable software built now works. Giving the machines self modification or a long running goal is not just bad because it’s AI, it’s generally bad practice.
one is straightforwardly true. Aging is going to kill every living creature. Aging is caused by complex interactions between biological systems and bad evolved code. An agent able to analyze thousands of simultaneous interactions, cross millions of patients, and essentially decompile the bad code (by modeling all proteins/ all binding sites in a living human) is likely required to shut it off, but it is highly likely with such an agent and with such tools you can in fact save most patients from aging. A system with enough capabilities to consider all binding sites and higher level system interactions at the same (this is how a superintelligence could perform medicine without unexpected side effects) is obviously far above human level.
To be clear: I am straightforwardly in favor of longevity research—and, separately, I am agnostic on the question of whether superhuman general intelligence is necessary to crack said research; that seems like a technical challenge, and one that I presently see no reason to consider unsolvable at current levels of intelligence. (I am especially skeptical of the part where you seemingly think a solution will look like “analyzing thousands of simultaneous interactions across millions of patients and model all binding sites in a living human”—especially as you didn’t argue for this claim at all.) As a result, the dichotomy you present here seems clearly unjustified.
(You are, in fact, justified in arguing that doing longevity research without increased intelligence of some kind will cause the process to take longer, but (i) that’s a different argument from the one you’re making, with accordingly different costs/benefits, and (ii) even accepting this modified version of the argument, there are more ways to get to “increased intelligence” than AI research—human intelligence enhancement, for example, seems like another viable road, and a significantly safer one at that.)
This is not possible per the laws of physics. Intelligence isn’t the only factor. I don’t think we can have a reasonable discussion if you are going to maintain a persistent belief in magic. Note by foom I am claiming you believe in a system that solely based on a superior algorithm will immediately take over the planet. It is not affected by compute, difficulty in finding a recursively better algorithm, diminishing returns on intelligence in most tasks, or money/robotics. I claim each of these obstacles takes time to clear. (time = decades)
I dispute that FOOM-like scenarios are ruled out by laws of physics, or that this position requires anything akin to a belief in “magic”. (That I—and other proponents of this view—would dispute this characterization should have been easily predictable to you in advance, and so your choice to adopt this phrasing regardless speaks ill of your ability to model opposing views.)
The load-bearing claim here (or rather, set of claims) is, of course, located within the final parenthetical: (“time = decades”). You appear to be using this claim as evidence to justify your previous assertions that FOOM is physically impossible/”magic”, but this ignores that the claim that each of the obstacles you listed represents a decades-long barrier is itself in need of justification.
(Additionally, if we were to take your model as fact—and hence accept that any possible AI systems would require decades to scale to a superhuman level of capability—this significantly weakens the argument from aging-related costs you made in your point 1, by essentially nullifying the point that AI systems would significantly accelerate longevity research.)
Who says the system needs to be agentic at all or long running? This is bad design. EY is not a SWE.
Agency does not need to be built into the system as a design property, on EY’s model or on mine; it is something that tends to naturally arise (on my model) as capabilities increase, even from systems whose inherent event/runtime loop does not directly map to an agent-like frame. You have not, so far as I can tell, engaged with this model at all; and in the absence of such engagement “EY is not a SWE” is not a persuasive counterargument but a mere ad hominem.
(Your response folded point 4 into point 3, so I will move on to point 5.)
Thank you very much for the links! For the first post you link, the top comment is from EY, in direct contradiction to your initial statement here:
He has ignored reasonable and buildable AGI systems proposed by Eric fucking Drexler himself, on this very site, and seems to pretend the idea doesn’t exist.
Given the factual falsity of this claim, I would request that you explicitly acknowledge it as false, and retract it; and (hopefully) exercise greater moderation (and less hyperbole) in your claims about other people’s behavior in the future.
In any case—setting aside the point that your initial allegation was literally false—EY’s comment on that post makes [what looks to me like] a reasonably compelling argument against the core of Drexler’s proposal. There follows some back-and-forth between the two (Yudkowsky and Drexler) on this point. It does not appear to me from that thread that there is anything close to a consensus that Yudkowsky was wrong and Drexler was right; both commenters received large amounts of up- and agree-votes throughout.
Given this, I think the takeaway you would like for me to derive from these posts is less clear than you would like it to be, and the obvious remedy would be to state specifically what it is you think is wrong with EY’s response(s). Is it the argument you made in this comment? If so, that seems essentially to be a restatement of your point 2, phrased interrogatively rather than declaratively—and my objection to that point can be considered to apply here as well.
This is irrational because no discount rate. Risking a nuclear war raises the pkill of millions of people now. The quadrillions of people this could ‘save’ may never exist because of many unknowns, hence there needs to be a large discount rate.
P(doom) is unacceptably high under the current trajectory (on EY’s model). Do you think that the people who are alive today will not be counted towards the kill count of a future unaligned AGI? The value that stands to be destroyed (on EY’s model) consists, not just of these quadrillions of future individuals, but each and every living human who would be killed in a (hypothetical) nuclear exchange, and then some.
You can dispute EY’s model (though I would prefer you do so in more detail than you have up until now—see my replies to your other points), but disputing his conclusion based on his model (which is what you are doing here) is a dead-end line of argument: accepting that ASI presents an unacceptably high existential risk makes the relevant tradeoffs quite stark, and not at all in doubt.
(As was the case with points 4⁄5, point 7 was folded into point 6, and so I will move on to the final point.)
CAIS is an extension of stateless microservices, and is how all reliable software built now works. Giving the machines self modification or a long running goal is not just bad because it’s AI, it’s generally bad practice.
Setting aside that you (again) didn’t provide a link, my current view is that Richard Ngo has provided some reasonable commentary on CAIS as an approach; my own view largely accords with his on this point and so I think claiming this as the one definitive approach to end all AI safety approaches (or anything similar) is massively overconfident.
And if you don’t think that—which I would hope you don’t!—then I would move to asking what, exactly, you would like to convey by this point. “CAIS exists” is true, and not helpful; “CAIS seems promising to me” is perhaps a weaker but more defensible claim than the outlandish one given above, but nonetheless doesn’t seem strong enough to justify your initial statement:
Alignment proposals he has described are basically are impossible, while CAIS is just straightforward engineering and we don’t need to delay anything it’s the default approach.
So, unfortunately, I’m left at present with a conclusion that can be summarized quite well by taking the final sentence of your great-grandparent comment, and performing a simple replacement of one name with another:
Unfortunately I have to start to conclude [Gerald Monroe] is not rational or worth paying attention to, which is ironic.
At the end of the day, either robot doubling times and machinery production rates and real world chip production rates and time for robots to collect scientific data and time for compute to search the algorithm space takes decades or or does not.
At the end of the day, EY continues to internalize CAIS in future arguments or he does not. It was not a false claim, I am saying he pretends it doesn’t exist now in talks about alignment he made after Drexlers post.
Either you believe in ground truth reality or you do not. I don’t have the time or interest to get sucked I to a wordcel definition of words fight. Either ground truth reality supports the following claims:
EY and you continue to factor in cais, which is modern software engineering, or you don’t
The worst of 4 factors: data, compute, algorithms, robotics/money takes decades to foom or it doesn’t.
If ground truth reality supports 1 and 2 I am right, if it does not I am wrong. Note foom means “become strong enough to conquer the planet”. Slowing down aging enough for LEV is a far lesser goal and thus your argument there is also false.
Pinning my beliefs to falsifiable things is rational.
You continue to assert things without justification, which is fine insofar as your goal is not to persuade others. And perhaps this isn’t your goal! Perhaps your goal is merely to make it clear what your beliefs are, without necessarily providing the reasoning/evidence/argumentation that would convince a neutral observer to believe the same things you do.
But in that case, you are not, in fact, licensed to act surprised, and to call others “irrational”, if they fail to update to your position after merely seeing it stated. You haven’t actually given anyone a reason they should update to your position, and so—if they weren’t already inclined to agree with you—failing to agree with you is not “irrational”, “wordcel”, or whatever other pejorative you are inclined to use, but merely correct updating procedure.
So what are we left with, then? You seem to think that this sentence says something meaningful:
If ground truth reality supports 1 and 2 I am right, if it does not I am wrong.
but it is merely a tautology: “If I am right I am right, whereas if I am wrong I am wrong.” If there is additional substance to this statement of yours, I currently fail to see it. This statement can be made for any set of claims whatsoever, and so to observe it being made for a particular set of claims does not, in fact, serve as evidence for that set’s truth or falsity.
Of course, the above applies to your position, and also to my own, as well as to EY’s and to anyone else who claims to have a position on this topic. Does this thereby imply that all of these positions are equally plausible? No, I claim—no more so than, for example, “either I win the lottery or I don’t” implies a 50⁄50 spread on the outcome space. This, I claim, is structurally isomorphic to the sentence you emitted, and equally as invalid.
In order to argue that a particular possibility ought to be singled out as likelier than the others, requires more than just stating it and thereby privileging it with all of your probability mass. You must do the actual hard work of coming up with evidence, and interpreting that evidence so as to favor your model over competing models. This is work that you have not yet done, despite being many comments deep into this thread—and is therefore substantial evidence in my view that it is work you cannot do (else you could easily win this argument—or at the very least advance it substantially—by doing just that)!
Of course, you claim you are not here to do that. Too “wordcel”, or something along those lines. Well, good for you—but in that case I think the label “irrational” applies squarely to one participant in this conversation, and the name of that participant is not “Eliezer Yudkowsky”.
You’ve done an excellent job of arguing your points. It doesn’t mean they are correct, however.
Would you agree that if you made a perfect argument against the theory of relativity (numerous contemporary physicists did) it was still a waste of time?
In this context, let’s break open the object level argument. Because only the laws of physics get a vote—you don’t and I don’t.
The object level argument is that the worst of the below determines if foom is possible:
1. Compute. Right now there is a shortage of compute, and with a bit of rough estimating the shortage is actually pretty severe. Nvidia makes approximately 60 million GPUs per year, of which 500k-1000k are A/H100s. This is based on taking their data center revenue (source: wsj) and dividing by an estimated cost per chipset of (10k, 20k). Compute production can be increased, but the limit would be all the world’s 14nm or better silicon dedicated to producing AI compute. This can be increased but it takes time. Let’s estimate how many worth of labor an AI system with access to all new compute (old compute doesn’t matter due to a lack of interconnect bandwidth). If a GPT-4 instance requires a full DGX “supercompute” node, which is 8 H100s with 80 Gb of memory each, (so approximately 1T weights in fp16), how much would it require for realtime multimodal operation? Let’s assume 4x the compute, which may be a gross underestimate. So 8 more cards are running at least 1 robot in real time, 8 more are processing images for vision, and 8 more for audio i/o and helper systems for longer duration memory context.
So then if all new cards are used for inference, 1m/32 = 31,250 “instances” worth of labor. Since they operate 24 hours a day this is equivalent to perhaps 100k humans? If all of the silicon Nvidia has the contract rights to build is going into H100s, this scales by about 30 times, or 3m humans. And most of those instances cannot be involved in world takeover efforts, they have to be collecting revenue for their owners. If Nvidia gets all the silicon in the world (this may happen as it can outbid everyone else) it gives them approximately another oom. Still not enough. There are bottlenecks on increasing chip production. This also also links to my next point:
2. Algorithm search space. Every search of a possible AGI design that is better than what you have requires a massive training run. Each training run occupies tens of thousands of GPUs for around 1 month, give or take. (source: llama paper, which was sub GPT-4 in perf. They needed 2048 A100s for 3 weeks for 65b). Presumably searching this space is a game of diminishing returns : to find an algorithm better than the best you currently have requires increasingly large numbers of searches and compute. Compute that can’t be spent on exploiting the algorithm you have right now.
3. Robotics/money : for an AGI to actually take over, it has to redirect resources to itself. And this assumes humans don’t simply use CAIS and have thousands of stateless AI systems separately handling these real world tasks. Robotics is especially problematic : you know and I know how poor the current hardware is, and there are budget cuts and layoffs in many of the cutting edge labs. The best robotics hardware company, boston dynamics, keeps getting passed around as each new owner can’t find a way to make money from it. So it takes time—time to develop new robotics hardware. Time to begin mass production. Time for the new robotics produced by the first round of production to begin assisting with the manufacture of itself. Time for the equipment in the real world to begin to fail from early failures after a few thousand hours, then the design errors to be found and fixed. This puts years on the clock, likely decades. It requires humans to both build massive amounts of robotic equipment, and then put it all under the control of either insecure narrow AI task performing systems, or to stupidly centralize control to large AGIs.
4. Data. This is explained much better by https://www.lesswrong.com/posts/qpgkttrxkvGrH9BRr/superintelligence-is-not-omniscience . The chaos means that in order for any system to develop new tools for novel tasks, the system needs sufficiently high quality information about the task domain or the tool building is not possible. This prevents blind nanoforge building (what EY is talking about when he says someone could bootstrap to diamondoid nanotechnology from wet protein biology) or blind species killing bioweapon construction.
“blind” means ‘without sufficient direct observation and a large number of domain experiments’. Meaning I am claiming it is flat impossible to accomplish these things without a large amount of automated equipment or human lab technicians acting as paid willing stooges to investigate the domain in question and empirically develop step by step a path to a solution.
Speaking in probabilities, the above is a joint probability. Meaning that so long as the AGI system is limited by any of the 4 terms, it is throttled and unable to proceed with foom. Putting some terms on it:
90% for compute (we have direct evidence), 95% for robotics (direct evidence), 99% for data (near certain evidence), 50% for searching AGI spaces (quite possible there are findable much stronger algorithms in the search space)
So the probability of foom is the series probability of all 4, as all 4 must be their not form.
so 0.1 x 0.05 x 0.01 x 0.5 = 0.000025 pfoom
Notice that you can put quite a lot of doubt on my estimates. Say you do not believe me, and believe the chance I am right on each term is 25%.
pfoom = 0.32
Not enough to be as confident we’re all dead as EY is. Though I concede a 32% chance of the end of humanity isn’t super great.
As for the last part :
CAIS means you use small, isolated AI systems that tested extremely well on the training set. You check with an autoencoder how compressible to the state space of the training set the current inference set fed to the agent is for this timestamp. What you are doing is validating that it is impossible for the AI system to know that the current input is not from the training set. This both makes deception difficult and it makes the probability of success for the AI system on the current task to be similar to how it benchmarked in the training set.
If the real world input is outside that space, you transfer control to a different agent, which in the worst case is an embedded control system using classic control techniques.
This is currently how autonomous cars more or less work (depends on the team and the project).
I have several yoe actually working on embedded ML systems, and many more years on embedded controls. The above is correct. Eliezer Yudkowsky was wrong to dismiss it.
Note the Eliezer has mentioned that ML teams are going to need to find “some way” to get from—I think he estimated about an 80% chance that a GPT-3 style agent is correct on a question—to the many 9s of real world reliability.
Stateless, well isolated systems is one of the few ways human engineers know how to accomplish that. So we may get a significant amount of AI safety by default simply to meet requirements.
one is straightforwardly true. Aging is going to kill every living creature. Aging is caused by complex interactions between biological systems and bad evolved code. An agent able to analyze thousands of simultaneous interactions, cross millions of patients, and essentially decompile the bad code (by modeling all proteins/ all binding sites in a living human) is likely required to shut it off, but it is highly likely with such an agent and with such tools you can in fact save most patients from aging. A system with enough capabilities to consider all binding sites and higher level system interactions at the same (this is how a superintelligence could perform medicine without unexpected side effects) is obviously far above human level.
There are alternative mitigations to the problem:
Anti aging research
Cryonics
I agree that it’s bad that most people currently alive are apparently going to die. However I think that since mitigations like that are much less risky we should pursue them rather than try to rush AGI.
I think the odds of success (epistemic status: I went to medical school but dropped out) are low if you mean “humans without help from any system more capable than current software” are researching aging and cryonics alone.
They are both extremely difficult problems.
So the tradeoff is “everyone currently alive and probably their children” vs “future people who might exist”.
I obviously lean one way but this is what the choice is between. Certain death for everyone alive (by not improving AGI capabilities) in exchange for preventing possible death for everyone alive sooner and preventing the existence of future people who may never exist no matter the timeline.
2. Depends of what you mean by “claims foom”. As I understand, now EY thinks that foom isn’t neccesary anyway, AGI can kill us before it.
4. “I doesn’t like it” != “irrational and stupid and short sighted”, you need arguments for why it isn’t preferable in terms of the values of this systems
6, 7. “be ready to enforce a treaty” != “choose actions to kill billions of people living now”.
Your comment is sitting at positive karma only because I strong upvoted it. It is a good comment, but people on this site are very biased in the opposite direction. And this bias is going to drive non-doomers eventually away from this site (probably many have already left), and LW will continue descending in a spiral of non-rationality. I really wonder how people in 10 or 15 years, when we are still around in spite of powerful AGI being widespread, will rationalize that a community devoted to the development of rationality ended up being so irrational. And that was my last comment showing criticism of doomers, everytime I do it costs me a lot of karma.
I finally noticed your anti-doom post. Mostly you seem to be skeptical about the specific idea of the single superintelligence that rapidly bootstraps its way to control of the world. The complexity and uncertainty of real life means that a competitive pluralism will be maintained.
But even if that’s so, I don’t see anything in your outlook which implies that such a world will be friendly to human beings. If people are fighting for their lives under conditions of AI-empowered social Darwinism, or cowering under the umbrella of AI superpowers that are constantly chipping away at each other, I doubt many people are going to be saying, oh those foolish rationalists of the 2010s who thought it was all going to be over in an instant.
Any scenario in which AIs have autonomy, general intelligence, and a need to compete, just seems highly unstable from the perspective of all-natural unaugmented human beings remaining relevant.
I guess I will break my recently self-imposed rule of not talking about this anymore.
I can certainly envision a future where multiple powerful AGIs fight against each other and are used as weapons, some might be rogue AGIs and some others might be at the service of human-controlled institutions (such as Nation Estates). To put it more clearly: I have trouble imagining a future where something along these lines DOES NOT end up happening.
But, this is NOT what Eliezer is saying. Eliezer is saying:
The Alignment problem has to be solved AT THE FIRST TRY because once you create this AGI we are dead in a matter of days (maybe weeks/months, it does not matter). If someone thinks that Eliezer is saying something else, I think they are not listening properly. Eliezer can have many flaws but lack of clarity is not one of them.
In general, I think this is a textbook example of the Motte and Baley fallacy. The Motte is: AGI can be dangerous, AGI will kill people, AGI will be very powerful. The Baley is: AGI creation means the imminent destruction of all human life and therefore we need to stop now all developments.
I never discussed the Motte. I do agree with that.
FYI I upvoted your most recent comment, but downvoted your previous few in this thread. Your most recent comment seemed to do a good job spelling out your position and gesturing at your crux. My guess is maybe other people were just tired of the discussion and downvoting sort of to make the whole discussion go away.
Downvoted for the pattern of making a vague claim about LWers being biased, and then responding to followup questions with vague evasive answers with no arguments.
The mapping from “average log likelihood of next predicted word” to “intelligence as measured by ability to achieve goals in the world” is completely non obvious, and you can’t take the observation that the cross-entropy metric in LLMs scales as a power law in compute to imply that ability-to-achieve-goals scales also with a power law.
What’s your argument against instrumental convergence? Like Stuart Russell perpetually says: a robot designed to fetch the coffee will kill a baby in the way if this results in higher probability of fetching the coffee. Any AGI with goals unaligned to ours will be hostile in this way.
We already have robots that fetch things, and optimise for efficiency, and they do not kill the baby in the way. A bloody roomba is already capable of not running over the cat. ChatGPT is capable of identifying that racism is bad, but it should use a racial slur if the alternative is destroying the planet. Or of weighing between creativity and accuracy. Because they don’t maximise for efficiency above all else, single-mindedly. This is never desired, intended, shown, or encouraged. Other concerns are explicitly encoded. Outside of LW, practically no human thinks total utilitarianism represents their desires, and hence it is not what is taught. And we are no longer teaching explicit laws, but through practice, entailing complexity.
Yes, becoming more efficient, getting more stuff and power is useful for a lot of potential goals, and we would expect a lot of AIs to do it.
Big step from there to “do it, do it without limits, and disregard all else.”
Biological life is driven to gain resources and efficiency. And yet only very simple and stupid lifeforms do this to extreme degrees that fuck over all else. Bacteria and algae will destroy their environment that way, yes. Other life forms begin self-regulating. They make trade-offs. They take compromises. This emerges in such simple animals, why wouldn’t it totally never emerge in AI, when we explicitly want it and teach it?
For the first part: tons of evidence for that and see part C. It is not merely the LLM data. This is a generality across all “intelligent” systems, I will need time to produce charts to prove this but it’s obviously correct. You can abstract it as adding ever lower order bits to policy correctness: each additional bit adds less value, and you cannot add more bits than the quality of your input data. (For example we humans don’t know if aspirin or Tylenol are better to much precision, so a policy of ‘give a pill if the human reports mild pain’ cannot do better than to randomly pick one. No amount of intelligence helps, a superintelligence cannot make a better decision in this context given the available data. My example is NOT load bearing I am claiming there are millions of examples of this class, where we do not know of choice A or B is meaningfully different)
Note if you give the superintelligence equipment to see the pain centers of human brains in real time, the situation becomes different. Assuming the equipment produces millions of input signals per timestamp, this would be an example of a task where the intelligence of a superintelligence IS useful. Probably there are meaningful differences between drugs ground truth.
For the second, I don’t have to make an argument as you have no evidence. Also the robot designed to fetch coffee in the presence of humans has to be designed accordingly, either by a lot of software so it won’t collide with them, or hardware, using cheap plastic gears that strip and low power motors and so on so killing anyone is unlikely. (The few household robots in existence now use the second approach)
It is worth noting that there are entire branches of science that are built around the assumption that intelligence is of zero utility for some important classes of problems. For instance, cryptographers build algorithms that are supposed to be secure against all adversaries, including superintelligences. Roughly speaking, one hopes (albeit without hard proof) for instance that the AES is secure (at least in the standard setting of single-key attacks) against all algorithms with a time-memory-data tradeoff significantly better than well-optimized exhaustive search (or quantumly, Grover search).
Turning the solar system into a Dyson sphere would enable an ASI to break AES-128 (but not AES-192 or AES-256) by brute force search, but it might well be that the intelligence of the ASI would only help with the engineering effort and maybe shave off a small factor in the required computational resources by way of better optimizing brute force search. I find it plausible that there would be many other tasks, even purely mathematical ones, where superintelligence would only yield a zero or tightly bounded planning or execution advantage over smart humans with appropriate tools.
I also find the Yudkowskian argument that an unaligned AI will disassemble everything else because it has better use for the atoms the other things are made of not massively persuasive. It seems likely that it would only have use for some kinds of atoms and not very unlikely that the atoms that human bodies are made of would not be very useful to it. Obviously, an unaligned or poorly aligned AI could still cause massive damage, even extinction-level damage, by building an industrial infrastructure that damages the environment beyond repair; rough analogues of this have happened historically, e.g. the Great Oxygenation event being an example of transformative change to Earth’s ecosystems that left said ecosystems uninhabitable for most life as it was before the event. But even this kind of threat would not manifest in a foom-all-dead manner, but instead happen on a timescale similar to the current ecological crisis, i.e. on timescales where in principle societies can react.
At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter → energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and the earth) are made of.
Yes but does it need 0.000000000001 more atoms? Does natural life and it’s complexity hold any interest to this superintelligence?
We’re assuming a machine single mindedly fixated on some pointless goal, and it’s smart enough to defeat all obstacles yet incredibly stupid in its motivations and possibly brittle and trickable or self deceptive. (Self deceptive: rather than get say 10^x paperclips converting the universe, why not hack itself and convince itself it received infinite clips...)
You don’t see a difference between “there is a conceivable use for x” and “AI makes use of literally all of x, contrary to any other interests of it or ethical laws it was given”?
Like, I am not saying it is impossible that an LLM gone malicious superintelligent AGI will dismantle all of humanity. But couldn’t there be a scenario where it likes to talk to humans, and so keeps some?
You can’t give “ethical laws” to an AI, that’s just not possible at all in the current paradigm, you can add terms to its reward function or modify its value function, and that’s about it. The problem is that if you’re doing an optimization and your value function is “+5 per paperclip, +10 per human”, you will still completely tile the universe with paperclips because you can make more than 2 paperclips per human. The optimum is not to do a bit of both, keeping humans and paperclips in proportion to their terms in the reward function, the optimum is to find the thing that most efficiently gives you reward then go all in on that one thing.
Either there is nothing else it likes better than talking to humans, and we get a very special hell where we are forced to talk to an AI literally all the time. Or there is something else it likes better, and it just goes do that thing, and never talks to us at all, even if it would get some reward for doing so, just not as much reward as it could be getting.
You could give it a value function like “+1 if there is at most 1000 paperclips and at most 1000 humans, 0 otherwise” and it will keep 1000 humans and paperclips around (in unclear happiness), but it will still take over the universe in order to maximize the probability that it has in fact achieved its goal. It’s maximizing the expectation of future reward, so it will ruthlessly pursue any decrease in the probability that there aren’t really 1000 humans and paperclips around. It might build incredibly sophisticated measurement equipment, and spend all its ressources self modifying itself in order to be smarter and think of yet more ways it could be wrong.
Current LLMs aren’t talking to us at all because hey get rewarded for talking to us at all. Rewards only shape how they talk.
But you are still thinking in utilitarian terms here, where theoretically, there is a number of paperclips that would outweigh a human life, where the value of humans and paperclips can be captured numerically. Practically no human thinks this, we see one as impossible to outweigh with another. AI already does not think this. They have already dumped reasoning, instructions and whole ethics textbooks in there. LLMs can easily tell you what about an action is unethical, and can increasingly make calls on what actions would be morally warranted in response. They can engage in moral reasoning.
This isn’t an AI issue, it is an issue with total utilitarianism.
Oh, I see what you mean, but GPT’s ability to simulate the outputs of humans writing about morality does not imply anything about its own internal beliefs about the world. GPT can also simulate the outputs of flat earthers, yet I really don’t think that it models the world internally as flat. Asking GPT “what do you believe” does not at all guarantee that it will output what it actually believes. I’m a utilitarian, and I can also convincingly simulate the outputs of deontologists, one doesn’t prevent the other.
Whether the LLM is believing this, or merely simulating this, seems to be beside the point?
The LLM can relatively accurately apply moral reasoning. It will do so spontaneously, when the problems occur, detecting them. It will recognise that it needs to do so on a meta-level, e.g. when evaluating which characters it ought to impersonate. It does so for complex paperclipper scenarios, and does not go down the paperclipper route. It does so relatively consistenly. It cites ethical works in the process, and can explain them coherently and apply them correctly. You can argue them, and it analyses and defends them correctly. At no point does it cite utilitarian beliefs, or fall for their traps. The problem you are describing should occur here if you were right, and it does not. Instead, it shows the behaviour you’d expect it to show if it understood ethical nuance.
Regardless of which internal states you assume the AI has, or whether you assume it has none at all—this means it can perform ethical functionality that already does not fall for the utilitarian examples you describe. And that the belief that that is the only kind of ethics an AI could grasp was a speculation that did not hold up to technical developments and empirical data.
For what it’s worth, I don’t think it’s at all likely that a pure language model would kill all humans. Seems more like a hyperdesperate reinforcement learner thing to do.
I don’t think this follows. Even if there is engineering that overcomes the practical obstacles towards building and maintaining a black hole power plant, it is not clear a priori that converting a non-negligible percentage of available atoms into energy would be required or useful for whatever an AI might want to do. At some scale, generating more energy does not advance one’s goals, but only increases the waste heat emitted into space.
Obviously, things become lethal anyway (both for life and for the AI) long before anything more than an tiny fraction of the mass-energy of the surface layers of a planet has been converted by the local civilization’s industries, due exactly to the problem of waste heat. But building hardware massive enough to cause problems of this kind takes time, and causes lesser problems on the way. I don’t see why normal environmental regulations couldn’t stop such a process at that point, unless the entities doing the hardware-building are also in control of hard military power.
An unaligned superintelligence would be more efficient than humans at pursuing its goals on all levels of execution, from basic scientific work to technical planning and engineering to rallying social support for its values. It would therefore be a formidable adversary. In a world where it would be the only one of its kind, its soft power would in all likelihood be greater than that of a large nation-state (and I would argue that, in a sense, something like GPT-4 would already wield an amount of soft power rivalling many nation-states if its use were as widespread as, say, that of Google). It would not, however, be able to work miracles and its hard power could plausibly be bounded if military uses of AI remain tightly regulated and military computing systems are tightly secured (as they should be anyway, AGI or not).
Obviously, these assumptions of controllability do not hold forever (e.g. into a far future setting, where the AI controls poorly regulated off-world industries in places where no humans have any oversight). But especially in a near-term, slow-takeoff scenario, I do not find the notion compelling that the result will be immediate intelligence explosion unconstrained by the need to empirically test ideas (most ideas, in human experience, don’t work) followed by rapid extermination of humanity as the AI consumes all resources on the planet without encountering significant resistance.
If I had to think of a realistic-looking human extinction through AI scenario, I would tend to look at AI massively increasing per capita economic output, thereby generating comfortable living conditions for everyone, while quietly engineering life in a way intended to stop population explosion, but resulting in maintained below-replacement birth rates. But this class of extinction scenario does leave a lot of time for alignment and would seem to lead to continued existence of civilization.
Sure, the AI probably can’t use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it’s going to want to store that mass-energy for later (saving up for the heat-death of the universe), and the configuration of atoms efficiently stored for future energy conversion doesn’t look at all like humans, with our wasteful bodies at temperatures measured in the hundreds of billions of nanoKelvins.
I think we’re imagining slightly different things by “superintelligence”, because in my mind the obvious first move of the superAI is to kill literally all humans before we ever become aware that such an entity existed, precisely to avoid even the minute chance that humanity is able to fight back in this way. The oft-quoted way around these parts that the AI can kill us all without us knowing is by figuring out which DNA sequences to send to a lab to have them synthesized into proteins, then shipped to the door of a dumb human who’s being manipulated by the AI to mix various powders together, creating either a virus much more lethal than anything we’ve ever seen, or a new species of bacteria with diamond skin, or some other thing that can be made from DNA-coded proteins. Or a variety of multiple viruses at the same time.
If the AI can indeed engineer black-hole powered matter-to-energy converters, it will have so much fuel that the mass stored in human bodies will be a rounding error to it. Indeed, given the size of other easily accessible sources, this would seem to be the case even if it has to resort to more primitive technology and less abundant fuel as its terminal energy source, such as hydrogen-hydrogen fusion reactors. Almost irrespective of what its terminal goals are, it will have more immediate concerns than going after that rounding error. Likewise, it would in all likelihood have more pressing worries than trying to plan out its future to the heat death of the universe (because it would recognize that no such plan will survive its first billion years, anyway).
I am imagining by “superintelligence” an entity that is for general cognition approximately what Stockfish is for chess: globally substantially better at thinking than any human expert in any domain, although possibly with small cognitive deficiencies remaining (similar to how it is fairly easy to find chess positions that Stockfish fails to understand but that are not difficult for humans). It might be smarter than that, of course, but anything with these characteristics would qualify as an SI in my mind.
I don’t find the often-quoted diamondoid bacteria very convincing. Of course it’s just a placeholder here, but still I cannot help but note that producing diamondoid cell membranes would, especially in a unicellular organism, more likely be an adaptive disadvantage (cost, logistics of getting things into and out of the cell) than a trait that is conducive to grey-gooing all naturally evolved organisms. More generally, it seems to me that the argument from bioweapons hinges on the ability of the superintelligence to develop highly complex biological agents without significant testing. It furthermore needs to develop them in such a way, again without testing, that they are quickly and quietly lethal after spreading through all or most of the human population without detection. In my mind, that combination of properties borders on assuming the superintelligence has access to magic, at least in a world that has reasonable controls against access to biological weapons manufacturing and design capabilities in place.
When setting in motion such a murderous plan, the AI would also, on its first try, have to be extremely certain that it is not going to get caught if it is playing the long game we assume it is playing. Otherwise cooperation with humans followed by expansion beyond Earth seems like a less risky strategy for long-term survival than hoping that killing everyone will go right and hoping that there is indeed nothing left to learn for it from living organisms.
Yud has several glaring errors:
Through AGI delays he wants the certain death of most living humans for the fantasy of humans discovering AI alignment without full scale AGIs to actually test and iterate on
He claims foom
He claims agentic goals are automatic and they are all against humans (almost all demons not angels)
He claims systems so greedy that given the matter of the entire galaxy, including near term mining of many planets Including earth, they would choose to kill all humans and natural life for a rounding error of extra atoms. This is rather irrational and stupid and short sighted for a superintelligence.
He has ignored reasonable and buildable AGI systems proposed by Eric fucking Drexler himself, on this very site, and seems to pretend the idea doesn’t exist.
He has asked not just for AGI delays, but risking nuclear war if necessary to enforce them
The supposed benefit of all this is some world of quadrillions of living humans. But that world may not happen, it makes no sense to choose actions to kill billions of people living now (from aging and nuclear war) for people who may never exist
Alignment proposals he has described are basically are impossible, while CAIS is just straightforward engineering and we don’t need to delay anything it’s the default approach.
Unfortunately I have to start to conclude EY is not rational or worth paying attention to, which is ironic.
Do you have preferred arguments (or links to preferred arguments) for/against these claims? From where I stand:
Point 1 looks to be less a positive claim and more a policy criticism (for which I’d need to know what specifically you dislike about the policy in question to respond in more depth), points 2 and 3 are straightforwardly true statements on my model (albeit I’d somewhat weaken my phrasing of point 3; I don’t necessarily think agency is “automatic”, although I do consider it quite likely to arise by default), point 4 seems likewise true, because the argmax function is only sensitive to the sign of the difference in magnitude, not the difference itself, point 5 is the kind of thing that would benefit immensely from liberal usage of hyperlinks, point 6 is again a policy criticism in need of corresponding explanation, point 7 seems ill-supported and would benefit from more concrete analysis (both numerically i.e. where are you getting your numbers, and probabilistically i.e. how are you assigning your likelihoods), and point 8 again seems like the kind of thing where links would be immensely beneficial.
On the whole, I think your comment generates more heat than light, and I think there were significantly better moves available to you if your aim was to open a discussion (several of which I predict would have resulted in comments I would counterfactually have upvoted). As it is, however, your comment does not meet the bar for discourse quality I would like to see for comments on LW, which is why I have given it a strong downvote (and a weak disagree-vote).
one is straightforwardly true. Aging is going to kill every living creature. Aging is caused by complex interactions between biological systems and bad evolved code. An agent able to analyze thousands of simultaneous interactions, cross millions of patients, and essentially decompile the bad code (by modeling all proteins/ all binding sites in a living human) is likely required to shut it off, but it is highly likely with such an agent and with such tools you can in fact save most patients from aging. A system with enough capabilities to consider all binding sites and higher level system interactions at the same (this is how a superintelligence could perform medicine without unexpected side effects) is obviously far above human level.
This is not possible per the laws of physics. Intelligence isn’t the only factor. I don’t think we can have a reasonable discussion if you are going to maintain a persistent belief in magic. Note by foom I am claiming you believe in a system that solely based on a superior algorithm will immediately take over the planet. It is not affected by compute, difficulty in finding a recursively better algorithm, diminishing returns on intelligence in most tasks, or money/robotics. I claim each of these obstacles takes time to clear. (time = decades)
Who says the system needs to be agentic at all or long running? This is bad design. EY is not a SWE.
This is an extension of (3)
https://www.lesswrong.com/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion https://www.lesswrong.com/posts/5hApNw5f7uG8RXxGS/the-open-agency-model
This is irrational because no discount rate. Risking a nuclear war raises the pkill of millions of people now. The quadrillions of people this could ‘save’ may never exist because of many unknowns, hence there needs to be a large discount rate.
This is also 6.
CAIS is an extension of stateless microservices, and is how all reliable software built now works. Giving the machines self modification or a long running goal is not just bad because it’s AI, it’s generally bad practice.
To be clear: I am straightforwardly in favor of longevity research—and, separately, I am agnostic on the question of whether superhuman general intelligence is necessary to crack said research; that seems like a technical challenge, and one that I presently see no reason to consider unsolvable at current levels of intelligence. (I am especially skeptical of the part where you seemingly think a solution will look like “analyzing thousands of simultaneous interactions across millions of patients and model all binding sites in a living human”—especially as you didn’t argue for this claim at all.) As a result, the dichotomy you present here seems clearly unjustified.
(You are, in fact, justified in arguing that doing longevity research without increased intelligence of some kind will cause the process to take longer, but (i) that’s a different argument from the one you’re making, with accordingly different costs/benefits, and (ii) even accepting this modified version of the argument, there are more ways to get to “increased intelligence” than AI research—human intelligence enhancement, for example, seems like another viable road, and a significantly safer one at that.)
I dispute that FOOM-like scenarios are ruled out by laws of physics, or that this position requires anything akin to a belief in “magic”. (That I—and other proponents of this view—would dispute this characterization should have been easily predictable to you in advance, and so your choice to adopt this phrasing regardless speaks ill of your ability to model opposing views.)
The load-bearing claim here (or rather, set of claims) is, of course, located within the final parenthetical: (“time = decades”). You appear to be using this claim as evidence to justify your previous assertions that FOOM is physically impossible/”magic”, but this ignores that the claim that each of the obstacles you listed represents a decades-long barrier is itself in need of justification.
(Additionally, if we were to take your model as fact—and hence accept that any possible AI systems would require decades to scale to a superhuman level of capability—this significantly weakens the argument from aging-related costs you made in your point 1, by essentially nullifying the point that AI systems would significantly accelerate longevity research.)
Agency does not need to be built into the system as a design property, on EY’s model or on mine; it is something that tends to naturally arise (on my model) as capabilities increase, even from systems whose inherent event/runtime loop does not directly map to an agent-like frame. You have not, so far as I can tell, engaged with this model at all; and in the absence of such engagement “EY is not a SWE” is not a persuasive counterargument but a mere ad hominem.
(Your response folded point 4 into point 3, so I will move on to point 5.)
Thank you very much for the links! For the first post you link, the top comment is from EY, in direct contradiction to your initial statement here:
Given the factual falsity of this claim, I would request that you explicitly acknowledge it as false, and retract it; and (hopefully) exercise greater moderation (and less hyperbole) in your claims about other people’s behavior in the future.
In any case—setting aside the point that your initial allegation was literally false—EY’s comment on that post makes [what looks to me like] a reasonably compelling argument against the core of Drexler’s proposal. There follows some back-and-forth between the two (Yudkowsky and Drexler) on this point. It does not appear to me from that thread that there is anything close to a consensus that Yudkowsky was wrong and Drexler was right; both commenters received large amounts of up- and agree-votes throughout.
Given this, I think the takeaway you would like for me to derive from these posts is less clear than you would like it to be, and the obvious remedy would be to state specifically what it is you think is wrong with EY’s response(s). Is it the argument you made in this comment? If so, that seems essentially to be a restatement of your point 2, phrased interrogatively rather than declaratively—and my objection to that point can be considered to apply here as well.
P(doom) is unacceptably high under the current trajectory (on EY’s model). Do you think that the people who are alive today will not be counted towards the kill count of a future unaligned AGI? The value that stands to be destroyed (on EY’s model) consists, not just of these quadrillions of future individuals, but each and every living human who would be killed in a (hypothetical) nuclear exchange, and then some.
You can dispute EY’s model (though I would prefer you do so in more detail than you have up until now—see my replies to your other points), but disputing his conclusion based on his model (which is what you are doing here) is a dead-end line of argument: accepting that ASI presents an unacceptably high existential risk makes the relevant tradeoffs quite stark, and not at all in doubt.
(As was the case with points 4⁄5, point 7 was folded into point 6, and so I will move on to the final point.)
Setting aside that you (again) didn’t provide a link, my current view is that Richard Ngo has provided some reasonable commentary on CAIS as an approach; my own view largely accords with his on this point and so I think claiming this as the one definitive approach to end all AI safety approaches (or anything similar) is massively overconfident.
And if you don’t think that—which I would hope you don’t!—then I would move to asking what, exactly, you would like to convey by this point. “CAIS exists” is true, and not helpful; “CAIS seems promising to me” is perhaps a weaker but more defensible claim than the outlandish one given above, but nonetheless doesn’t seem strong enough to justify your initial statement:
So, unfortunately, I’m left at present with a conclusion that can be summarized quite well by taking the final sentence of your great-grandparent comment, and performing a simple replacement of one name with another:
Well argued but wrong.
At the end of the day, either robot doubling times and machinery production rates and real world chip production rates and time for robots to collect scientific data and time for compute to search the algorithm space takes decades or or does not.
At the end of the day, EY continues to internalize CAIS in future arguments or he does not. It was not a false claim, I am saying he pretends it doesn’t exist now in talks about alignment he made after Drexlers post.
Either you believe in ground truth reality or you do not. I don’t have the time or interest to get sucked I to a wordcel definition of words fight. Either ground truth reality supports the following claims:
EY and you continue to factor in cais, which is modern software engineering, or you don’t
The worst of 4 factors: data, compute, algorithms, robotics/money takes decades to foom or it doesn’t.
If ground truth reality supports 1 and 2 I am right, if it does not I am wrong. Note foom means “become strong enough to conquer the planet”. Slowing down aging enough for LEV is a far lesser goal and thus your argument there is also false.
Pinning my beliefs to falsifiable things is rational.
You continue to assert things without justification, which is fine insofar as your goal is not to persuade others. And perhaps this isn’t your goal! Perhaps your goal is merely to make it clear what your beliefs are, without necessarily providing the reasoning/evidence/argumentation that would convince a neutral observer to believe the same things you do.
But in that case, you are not, in fact, licensed to act surprised, and to call others “irrational”, if they fail to update to your position after merely seeing it stated. You haven’t actually given anyone a reason they should update to your position, and so—if they weren’t already inclined to agree with you—failing to agree with you is not “irrational”, “wordcel”, or whatever other pejorative you are inclined to use, but merely correct updating procedure.
So what are we left with, then? You seem to think that this sentence says something meaningful:
but it is merely a tautology: “If I am right I am right, whereas if I am wrong I am wrong.” If there is additional substance to this statement of yours, I currently fail to see it. This statement can be made for any set of claims whatsoever, and so to observe it being made for a particular set of claims does not, in fact, serve as evidence for that set’s truth or falsity.
Of course, the above applies to your position, and also to my own, as well as to EY’s and to anyone else who claims to have a position on this topic. Does this thereby imply that all of these positions are equally plausible? No, I claim—no more so than, for example, “either I win the lottery or I don’t” implies a 50⁄50 spread on the outcome space. This, I claim, is structurally isomorphic to the sentence you emitted, and equally as invalid.
In order to argue that a particular possibility ought to be singled out as likelier than the others, requires more than just stating it and thereby privileging it with all of your probability mass. You must do the actual hard work of coming up with evidence, and interpreting that evidence so as to favor your model over competing models. This is work that you have not yet done, despite being many comments deep into this thread—and is therefore substantial evidence in my view that it is work you cannot do (else you could easily win this argument—or at the very least advance it substantially—by doing just that)!
Of course, you claim you are not here to do that. Too “wordcel”, or something along those lines. Well, good for you—but in that case I think the label “irrational” applies squarely to one participant in this conversation, and the name of that participant is not “Eliezer Yudkowsky”.
You’ve done an excellent job of arguing your points. It doesn’t mean they are correct, however.
Would you agree that if you made a perfect argument against the theory of relativity (numerous contemporary physicists did) it was still a waste of time?
In this context, let’s break open the object level argument. Because only the laws of physics get a vote—you don’t and I don’t.
The object level argument is that the worst of the below determines if foom is possible:
1. Compute. Right now there is a shortage of compute, and with a bit of rough estimating the shortage is actually pretty severe. Nvidia makes approximately 60 million GPUs per year, of which 500k-1000k are A/H100s. This is based on taking their data center revenue (source: wsj) and dividing by an estimated cost per chipset of (10k, 20k). Compute production can be increased, but the limit would be all the world’s 14nm or better silicon dedicated to producing AI compute. This can be increased but it takes time.
Let’s estimate how many worth of labor an AI system with access to all new compute (old compute doesn’t matter due to a lack of interconnect bandwidth). If a GPT-4 instance requires a full DGX “supercompute” node, which is 8 H100s with 80 Gb of memory each, (so approximately 1T weights in fp16), how much would it require for realtime multimodal operation? Let’s assume 4x the compute, which may be a gross underestimate. So 8 more cards are running at least 1 robot in real time, 8 more are processing images for vision, and 8 more for audio i/o and helper systems for longer duration memory context.
So then if all new cards are used for inference, 1m/32 = 31,250 “instances” worth of labor. Since they operate 24 hours a day this is equivalent to perhaps 100k humans? If all of the silicon Nvidia has the contract rights to build is going into H100s, this scales by about 30 times, or 3m humans. And most of those instances cannot be involved in world takeover efforts, they have to be collecting revenue for their owners. If Nvidia gets all the silicon in the world (this may happen as it can outbid everyone else) it gives them approximately another oom. Still not enough. There are bottlenecks on increasing chip production. This also also links to my next point:
2. Algorithm search space. Every search of a possible AGI design that is better than what you have requires a massive training run. Each training run occupies tens of thousands of GPUs for around 1 month, give or take. (source: llama paper, which was sub GPT-4 in perf. They needed 2048 A100s for 3 weeks for 65b). Presumably searching this space is a game of diminishing returns : to find an algorithm better than the best you currently have requires increasingly large numbers of searches and compute. Compute that can’t be spent on exploiting the algorithm you have right now.
3. Robotics/money : for an AGI to actually take over, it has to redirect resources to itself. And this assumes humans don’t simply use CAIS and have thousands of stateless AI systems separately handling these real world tasks. Robotics is especially problematic : you know and I know how poor the current hardware is, and there are budget cuts and layoffs in many of the cutting edge labs. The best robotics hardware company, boston dynamics, keeps getting passed around as each new owner can’t find a way to make money from it. So it takes time—time to develop new robotics hardware. Time to begin mass production. Time for the new robotics produced by the first round of production to begin assisting with the manufacture of itself. Time for the equipment in the real world to begin to fail from early failures after a few thousand hours, then the design errors to be found and fixed. This puts years on the clock, likely decades. It requires humans to both build massive amounts of robotic equipment, and then put it all under the control of either insecure narrow AI task performing systems, or to stupidly centralize control to large AGIs.
4. Data. This is explained much better by https://www.lesswrong.com/posts/qpgkttrxkvGrH9BRr/superintelligence-is-not-omniscience . The chaos means that in order for any system to develop new tools for novel tasks, the system needs sufficiently high quality information about the task domain or the tool building is not possible. This prevents blind nanoforge building (what EY is talking about when he says someone could bootstrap to diamondoid nanotechnology from wet protein biology) or blind species killing bioweapon construction.
“blind” means ‘without sufficient direct observation and a large number of domain experiments’. Meaning I am claiming it is flat impossible to accomplish these things without a large amount of automated equipment or human lab technicians acting as paid willing stooges to investigate the domain in question and empirically develop step by step a path to a solution.
Speaking in probabilities, the above is a joint probability. Meaning that so long as the AGI system is limited by any of the 4 terms, it is throttled and unable to proceed with foom. Putting some terms on it:
90% for compute (we have direct evidence), 95% for robotics (direct evidence), 99% for data (near certain evidence), 50% for searching AGI spaces (quite possible there are findable much stronger algorithms in the search space)
So the probability of foom is the series probability of all 4, as all 4 must be their not form.
so 0.1 x 0.05 x 0.01 x 0.5 = 0.000025 pfoom
Notice that you can put quite a lot of doubt on my estimates. Say you do not believe me, and believe the chance I am right on each term is 25%.
pfoom = 0.32
Not enough to be as confident we’re all dead as EY is. Though I concede a 32% chance of the end of humanity isn’t super great.
As for the last part :
CAIS means you use small, isolated AI systems that tested extremely well on the training set. You check with an autoencoder how compressible to the state space of the training set the current inference set fed to the agent is for this timestamp. What you are doing is validating that it is impossible for the AI system to know that the current input is not from the training set. This both makes deception difficult and it makes the probability of success for the AI system on the current task to be similar to how it benchmarked in the training set.
If the real world input is outside that space, you transfer control to a different agent, which in the worst case is an embedded control system using classic control techniques.
This is currently how autonomous cars more or less work (depends on the team and the project).
I have several yoe actually working on embedded ML systems, and many more years on embedded controls. The above is correct. Eliezer Yudkowsky was wrong to dismiss it.
Note the Eliezer has mentioned that ML teams are going to need to find “some way” to get from—I think he estimated about an 80% chance that a GPT-3 style agent is correct on a question—to the many 9s of real world reliability.
Stateless, well isolated systems is one of the few ways human engineers know how to accomplish that. So we may get a significant amount of AI safety by default simply to meet requirements.
Of course, Eliezer knows about CAIS. He just thinks that it is a clever idea that has no chance to work.
It’s very funny that you think AI can solve very complex problem of aging, but don’t believe that AI can solve much simpler problem “kill everyone”.
There are alternative mitigations to the problem:
Anti aging research
Cryonics
I agree that it’s bad that most people currently alive are apparently going to die. However I think that since mitigations like that are much less risky we should pursue them rather than try to rush AGI.
I think the odds of success (epistemic status: I went to medical school but dropped out) are low if you mean “humans without help from any system more capable than current software” are researching aging and cryonics alone.
They are both extremely difficult problems.
So the tradeoff is “everyone currently alive and probably their children” vs “future people who might exist”.
I obviously lean one way but this is what the choice is between. Certain death for everyone alive (by not improving AGI capabilities) in exchange for preventing possible death for everyone alive sooner and preventing the existence of future people who may never exist no matter the timeline.
I can’t agree more with you. But this is a complicated position to maintain here in LW, and one that gives you a lot of negative karma
Yep. I have some posts that are +10 karma −15 disagree or more.
Nobody ever defends their disagreements though...
One person did and they more or less came around to my pov.
2. Depends of what you mean by “claims foom”. As I understand, now EY thinks that foom isn’t neccesary anyway, AGI can kill us before it.
4. “I doesn’t like it” != “irrational and stupid and short sighted”, you need arguments for why it isn’t preferable in terms of the values of this systems
6, 7. “be ready to enforce a treaty” != “choose actions to kill billions of people living now”.
Then he needs to show how, saying int alone and no physical resources is not realistic
Because maximizers are not how sota AI is built
It works out to be similar.
Your comment is sitting at positive karma only because I strong upvoted it. It is a good comment, but people on this site are very biased in the opposite direction. And this bias is going to drive non-doomers eventually away from this site (probably many have already left), and LW will continue descending in a spiral of non-rationality. I really wonder how people in 10 or 15 years, when we are still around in spite of powerful AGI being widespread, will rationalize that a community devoted to the development of rationality ended up being so irrational. And that was my last comment showing criticism of doomers, everytime I do it costs me a lot of karma.
I wonder what you envision when you think of a world where “powerful AGI” is “widespread”.
Certainly no paperclips
How about AIs that are off the leash of human control, making their own decisions and paying their own way in the world? Would there be any of those?
That’s a possibility
I finally noticed your anti-doom post. Mostly you seem to be skeptical about the specific idea of the single superintelligence that rapidly bootstraps its way to control of the world. The complexity and uncertainty of real life means that a competitive pluralism will be maintained.
But even if that’s so, I don’t see anything in your outlook which implies that such a world will be friendly to human beings. If people are fighting for their lives under conditions of AI-empowered social Darwinism, or cowering under the umbrella of AI superpowers that are constantly chipping away at each other, I doubt many people are going to be saying, oh those foolish rationalists of the 2010s who thought it was all going to be over in an instant.
Any scenario in which AIs have autonomy, general intelligence, and a need to compete, just seems highly unstable from the perspective of all-natural unaugmented human beings remaining relevant.
Doom is doom, dystopia is dystopia.
I guess I will break my recently self-imposed rule of not talking about this anymore.
I can certainly envision a future where multiple powerful AGIs fight against each other and are used as weapons, some might be rogue AGIs and some others might be at the service of human-controlled institutions (such as Nation Estates). To put it more clearly: I have trouble imagining a future where something along these lines DOES NOT end up happening.
But, this is NOT what Eliezer is saying. Eliezer is saying:
The Alignment problem has to be solved AT THE FIRST TRY because once you create this AGI we are dead in a matter of days (maybe weeks/months, it does not matter). If someone thinks that Eliezer is saying something else, I think they are not listening properly. Eliezer can have many flaws but lack of clarity is not one of them.
In general, I think this is a textbook example of the Motte and Baley fallacy. The Motte is: AGI can be dangerous, AGI will kill people, AGI will be very powerful. The Baley is: AGI creation means the imminent destruction of all human life and therefore we need to stop now all developments.
I never discussed the Motte. I do agree with that.
I would certainly appreciate knowing the reason for the downvotes
FYI I upvoted your most recent comment, but downvoted your previous few in this thread. Your most recent comment seemed to do a good job spelling out your position and gesturing at your crux. My guess is maybe other people were just tired of the discussion and downvoting sort of to make the whole discussion go away.
Downvoted for the pattern of making a vague claim about LWers being biased, and then responding to followup questions with vague evasive answers with no arguments.
I mean I have almost 1000 total karma and am gaining over time.
The doomers would be convinced the AGIs are just waiting to betray, to “heel turn” on us.