But superhuman capabilities doesn’t seem to imply “applies all the optimisation pressure it can towards a goal”.
Like, being crazily good at research projects may require the ability to do goal-directed cognition. It doesn’t seem to require the habit of monomaniacally optimising the universe towards a goal.
I think whether or not a crazy good research AI is a monomaniacal universe optimiser probably depends on what kind of AI it is.
The whole approach is pretty hopeless IMHO: I mean the approach of “well, the AI will be wicked smart, but we’ll just make it so that it doesn’t want anything particularly badly or so that what it wants tomorrow will be different from what it wants today”.
It seems fairly certain to me that having a superhuman ability to do things that humans want to be done entails applying strong optimization pressure onto reality—pressure that persists as long as the AI is able to make it persist—forever, ideally, from the point of view of the AI. The two are not separate things like you hope they are. Either the AI is wicked good at steering reality towards a goal or not. If it is wicked good, then either its goal is compatible with continued human survival or not, and if not, we are all dead. If it is not wicked good at steering reality, then no one is going to be able to figure out how to use it to align an AI such that it stays aligned once it is much smarter than us.
I subscribe to MIRI’s current position that most of the hope for continued human survival comes from the (slim) hope that no one builds super-humanly smart AI until there are AI researchers that are significantly smarter and wiser than the current generation of AI designers (which will probably take centuries unless it proves much easier to employ technology to improve human cognition than most people think it is).
But what hope I have for alignment research done by currently-living people comes mostly from the hope that someone will figure out how to make an ASI that genuinely wants the same things that we want—like Eliezer has been saying since 2006 or so.
By “non-world-destroying”, I assume you mean, “non-humanity ending”.
Well, yeah, if there were a way to keep AI models to roughly human capabilities that would be great because they would be unlikely to end humanity and because we could use them to do useful work with less expense (particularly, less energy expense and less CO2 emissions) than the expense of employing people.
But do you know of a safe way of making sure that, e.g., OpenAI’s next major training run will result in a model that is at most roughly human-level in every capability that can be used to end humanity or to put and to keep humanity in a situation that humanity would not want? I sure don’t—even if OpenAI were completely honest and cooperative with us.
The qualifier “safe” is present in the above paragraph / sentence because giving the model access to the internet (or to gullible people or to a compute farm where it can run any program it wants) then seeing what happens is only safe if we assume the thing to be proved, namely, that the model is not capable enough to impose its will on humanity.
But yeah, it is a source of hope (which I didn’t mention when I wrote, “what hope I have . . . comes mostly from the hope that someone will figure out how to make an ASI that genuinely wants the same things that we want”) that someone will develop a method to keep AI capabilities to roughly human level and all labs actually use the method and focus on making the human-level AIs more efficient in resource consumption even during a great-powers war or an arms race between great powers.
I’d be more hopeful if I had ever seen a paper or a blog post by a researcher trying to devise such a method.
For completeness’s sake, let’s also point out that we could ban large training runs now worldwide, then the labs could concentrate on running the models they have now more efficiently and that would be safe (not completely safe, but much much safer than any future timeline we can realistically hope for) and would allow us to derive some of the benefits of the technology.
I do not know of such a way. I find it unlikely that OpenAI’s next training run wil result in a model that could end humanity, but I can provide no guarantees about that.
You seem to be assuming that all models above a certain threshold of capabilites will either exercise strong optimization pressure on the world in pursuit of goals, or will be useless. Put another way, you seem to be conflating capabilities with actually exerted world-optimization pressures.
While I agree that given a wide enough deployment it is likely that a given model will end up exercising its capabilities pretty much to their fullest extent, I hold that it is in principle possible to construct a mind that desires to help and is able to do so, yet also deliberately refrains from applying too much pressure.
But superhuman capabilities doesn’t seem to imply “applies all the optimisation pressure it can towards a goal”.
Like, being crazily good at research projects may require the ability to do goal-directed cognition. It doesn’t seem to require the habit of monomaniacally optimising the universe towards a goal.
I think whether or not a crazy good research AI is a monomaniacal universe optimiser probably depends on what kind of AI it is.
The whole approach is pretty hopeless IMHO: I mean the approach of “well, the AI will be wicked smart, but we’ll just make it so that it doesn’t want anything particularly badly or so that what it wants tomorrow will be different from what it wants today”.
It seems fairly certain to me that having a superhuman ability to do things that humans want to be done entails applying strong optimization pressure onto reality—pressure that persists as long as the AI is able to make it persist—forever, ideally, from the point of view of the AI. The two are not separate things like you hope they are. Either the AI is wicked good at steering reality towards a goal or not. If it is wicked good, then either its goal is compatible with continued human survival or not, and if not, we are all dead. If it is not wicked good at steering reality, then no one is going to be able to figure out how to use it to align an AI such that it stays aligned once it is much smarter than us.
I subscribe to MIRI’s current position that most of the hope for continued human survival comes from the (slim) hope that no one builds super-humanly smart AI until there are AI researchers that are significantly smarter and wiser than the current generation of AI designers (which will probably take centuries unless it proves much easier to employ technology to improve human cognition than most people think it is).
But what hope I have for alignment research done by currently-living people comes mostly from the hope that someone will figure out how to make an ASI that genuinely wants the same things that we want—like Eliezer has been saying since 2006 or so.
An entity could have the ability to apply such strong optimization pressures onto reality, yet decide not to.
Such an entity would be useless to us IMHO.
Surely there exists a non-useless and non-world-destroying amount of optimization pressure?
By “non-world-destroying”, I assume you mean, “non-humanity ending”.
Well, yeah, if there were a way to keep AI models to roughly human capabilities that would be great because they would be unlikely to end humanity and because we could use them to do useful work with less expense (particularly, less energy expense and less CO2 emissions) than the expense of employing people.
But do you know of a safe way of making sure that, e.g., OpenAI’s next major training run will result in a model that is at most roughly human-level in every capability that can be used to end humanity or to put and to keep humanity in a situation that humanity would not want? I sure don’t—even if OpenAI were completely honest and cooperative with us.
The qualifier “safe” is present in the above paragraph / sentence because giving the model access to the internet (or to gullible people or to a compute farm where it can run any program it wants) then seeing what happens is only safe if we assume the thing to be proved, namely, that the model is not capable enough to impose its will on humanity.
But yeah, it is a source of hope (which I didn’t mention when I wrote, “what hope I have . . . comes mostly from the hope that someone will figure out how to make an ASI that genuinely wants the same things that we want”) that someone will develop a method to keep AI capabilities to roughly human level and all labs actually use the method and focus on making the human-level AIs more efficient in resource consumption even during a great-powers war or an arms race between great powers.
I’d be more hopeful if I had ever seen a paper or a blog post by a researcher trying to devise such a method.
For completeness’s sake, let’s also point out that we could ban large training runs now worldwide, then the labs could concentrate on running the models they have now more efficiently and that would be safe (not completely safe, but much much safer than any future timeline we can realistically hope for) and would allow us to derive some of the benefits of the technology.
I do not know of such a way. I find it unlikely that OpenAI’s next training run wil result in a model that could end humanity, but I can provide no guarantees about that.
You seem to be assuming that all models above a certain threshold of capabilites will either exercise strong optimization pressure on the world in pursuit of goals, or will be useless. Put another way, you seem to be conflating capabilities with actually exerted world-optimization pressures.
While I agree that given a wide enough deployment it is likely that a given model will end up exercising its capabilities pretty much to their fullest extent, I hold that it is in principle possible to construct a mind that desires to help and is able to do so, yet also deliberately refrains from applying too much pressure.