I hate to say this but I’m taking the side of the Spaceplane designers. Perhaps it’s because it’s what I know.
Three things I think it’s important to note explicitly here:
1. Eliezer’s essay above is just trying to state where he thinks humanity’s understanding of AI alignment is, and where he thinks it ultimately needs to be. The point of the fictional example is to make this view more concrete by explaining it in terms of concepts that we already understand well (rockets, calculus, etc.). None of this is an argument for Eliezer’s view “our understanding of AI alignment is relevantly analogous to the fictional rocket example”, just an attempt to be clearer about what the view even is.
2. “Don’t worry about developing calculus, questioning the geocentric model of the solar system, etc.” is the wrong decision in the fictional example Eliezer provided. You suggest, “once you start getting spaceplanes into orbit and notice that heading right for the moon isn’t making progress, you could probably get together some mathematicians and scrum together a rough model of orbital mechanics in time for the next launch”. I don’t think this is a realistic model of how basic research works. Possibly this is a crux between our models?
3. The value of the rocket analogy is that it describes a concrete “way the world could be” with respect to AI. Once this is added to the set of hypotheses under consideration, the important thing is to try to assess the evidence for which possible world we’re in. “I choose to act as though this other hypothesis is true because it’s what I know” should set off alarm bells in that context, as should any impulse to take the side of Team Don’t-Try-To-Understand-Calculus in the contrived fictional example, because this suggests that your models and choices might be insensitive to whether you’re actually in the kind of world where you’re missing an important tool like calculus.
It’s 100% fine to disagree about whether we are in fact in that world, but any indication that we should unconditionally act as though we’re not in that world—e.g., for reasons other than Bayesian evidence about our environment, or for reasons so strong they’re insensitive even to things as important as “we trying to get to the Moon and we haven’t figured out calculus yet”—should set off major alarms.
And making a spaceplane so powerful it wrecks the planet if it crashes into it, when you don’t know what you are doing...seems implausible to me.
Eliezer means the rocket analogy to illustrate his views on ‘how well do we understand AI alignment, and what kind of understanding is missing?‘, not ‘how big a deal is it if we mess up?’ AI systems aren’t rockets, so there’s no reason to extend the analogy further. (If we do want to compare flying machines and scientific-reasoning machines on this dimension, I’d call it relevant that flying organs have evolved many times in Nature, and never become globally dominant; whereas scientific-reasoning organs evolved just once, and took over the world very quickly.)
A relevant argument that’s nearby in conceptspace is ‘technologies are rarely that impactful, full stop; so we should have a strong prior that AGI won’t be that impactful either’.
I agree we can make an AI that powerful but I think we would need to know what we are doing. Nobody made fission bombs work by slamming radioactive rocks together, it took a set of millions of deliberate actions in a row, by an army of people, to get to the first nuclear weapon.
Eliezer doesn’t mean to argue that we’ll get to AGI by pure brute force, just more brute force than is needed for safety / robustness / precise targeting. “Build a system that’s really good at scientific reasoning, and only solves the kinds of problems we want it to” is a much more constrained problem than “Build a system that’s really good at scientific reasoning”, and it’s generallyhard to achieve much robustness / predictability / deep understanding of very novel software, even when that software isn’t as complex or opaque as a deep net.
It sounds to me like key disagreements might include “how much better at science are the first AGI systems built for science likely to be, compared to humans (who weren’t evolved to do science at all, but accidented into being capable of such)?” and “how many developers are likely to have the insights and other resources needed to design/train/deploy AGI in the first few years?” Your view makes more sense in my head when I imagine a world where AGI yields smaller capability gains, and where there aren’t a bunch of major players who can all deploy AGI within a few years of each other.
2. “Don’t worry about developing calculus, questioning the geocentric model of the solar system, etc.” is the wrong decision in the fictional example Eliezer provided. You suggest, “once you start getting spaceplanes into orbit and notice that heading right for the moon isn’t making progress, you could probably get together some mathematicians and scrum together a rough model of orbital mechanics in time for the next launch”. I don’t think this is a realistic model of how basic research works. Possibly this is a crux between our models?
The theoretical framework behind current AI research is essentially “here’s what we are regressing between, X and Y, or here’s some input data X, outputs in response Y, and a reward R”. % correct or biggest R is the objective. And for more complex reasons that I’m going to compress here, you also care about the distribution of the responses.
This is something we can run with. We can iteratively deploy an overall framework—a massive AI platform that is supported by a consortium of companies and offers the best and most consistent performance—that supports ever more sophisticated agent architectures. That is, at first, supported architectures are for problems where the feedback is immediate and the environment the system is operating in is very markovian and clean of dirt, and later we will be able to solve more abstract problems.
With this basic idea we can replace most current jobs on earth and develop fully autonomous manufacturing, resource gathering, construction,
Automating scientific research—there’s a way to extend this kind of platform to design experiments autonomously. Essentially you build upon a lower level predictive model by predicting the outcomes of composite experiments that use multiple phenomena at once, and you conduct more experiments where the variance is high. It’s difficult to explain and I don’t have it fully mapped out, but I think developing a systematic model for how macroscale mechanical physical systems work could be done autonomously. And then the same idea scaled to how low level subatomic systems works, and to iteratively engineer nanotechnology, and maybe work through cell biology a similar way.
Umm, maybe big picture will explain it better : you have hundred story + megaliths of robotic test cells, where the robotic cells were made in an automated factory. And for cracking problems like nanotechnology or cell bio, each test cell is conducting an experiment at some level of integration to address unreliable parts. For example, if you have nanoscale gears and motors working well, but not switches, each test cell is exhaustively searching possible variants of a switch—not the entire grid, but using search trees to guess where a successful switch design might be—to get that piece to work.
And you have a simulator—a system using both learnable weights and some structure—that predict the switch designs that didn’t work. You feed into the simulator the error between what it predicted would happen and what the actual robotic test waldos are finding in reality. This update to the simulation model makes the overall effort more likely to design the next piece of the long process to developing nanoscale self replicating factories more probable to succeed.
And a mix of human scientists/engineer and scripts that call on machine learning models decide what to do next once a particular piece of the problem is reliably solved.
There are humans involved, it would not be a hands off system, and the robotic system operating in each test cell uses a well known and rigidly designed architecture that can be understood, even if you don’t know how the details of each module function since they are weighted combinations of multiple machine learning algorithms, some of which were in turn developed by other algorithms.
I have a pet theory that even if you could build a self improving AI, you would need to give it access to such megaliths (a cube of modular rooms as wide on each side as it is tall, where each room was made in a factory and trucked onto the site and installed by robots) to generate the clean information needed to do the kinds of magical things we think superintelligent AIs could do.
Robotic systems are the way to get that information because each step they do is replicable. And you subtract what happens without intervention by the robotic arm from what happens when you do, giving you clean data that only has the intervention in it, plus whatever variance the system you are analyzing has inherently. I have a theory that things like nanotechnology, or the kind of real medicine that could reverse human biology age and turn off all possible tumors, or all the other things we know the laws of physics permit but we cannot yet do, can’t be found in a vacuum. If you could build an AI “deity” it couldn’t come up with this solution from just what humans have published (whether it be all scientific journals ever written or every written word and recorded image) because far too much uncertainty would remain. You still wouldn’t know, even with all information analyzed, exactly what arrangements of nanoscale gears will do in a vacuum chamber. Or what the optimal drug regimen to prevent Ms. Smith from developing another mycardial infarction was. You could probably get closer than humans ever have—but you would need to manipulate the environment to find out what you needed to do.
This is the concrete reason for my assessment that out of control AGI are probably not as big a risk as we think. If such machines can’t find the information needed to kill us all without systematically looking into this with a large amount of infrastructure, and the host hardware for such a system is specialized and not just freely available on unsecured systems on the internet, and we haven’t actually designed these systems with anything like self reflectance much less awareness, it seems pretty implausible.
But I could be wrong. Having a detailed model of how I think such things would really work, based upon my previous work with present day AI, doesn’t necessarily make me correct. But I certainly feel more correct.
I don’t think this is a realistic model of how basic research works. Possibly this is a crux between our models?
I’m responding to this statement directly in this post. No, this isn’t how basic research works. But just because centuries of inertia cause basic research to be structured a certain way doesn’t mean it has to be that way, or that my original statement is wrong.
You could quick and dirty assemble a model using curve fitting that would approximately tell you the relationship between the position of the Moon in the sky and a rocket’s thrust vector. It wouldn’t need to be a complete theory of gravitation, that theory that was developed over centuries. And it would work : approximate models are very often good enough.
Three things I think it’s important to note explicitly here:
1. Eliezer’s essay above is just trying to state where he thinks humanity’s understanding of AI alignment is, and where he thinks it ultimately needs to be. The point of the fictional example is to make this view more concrete by explaining it in terms of concepts that we already understand well (rockets, calculus, etc.). None of this is an argument for Eliezer’s view “our understanding of AI alignment is relevantly analogous to the fictional rocket example”, just an attempt to be clearer about what the view even is.
2. “Don’t worry about developing calculus, questioning the geocentric model of the solar system, etc.” is the wrong decision in the fictional example Eliezer provided. You suggest, “once you start getting spaceplanes into orbit and notice that heading right for the moon isn’t making progress, you could probably get together some mathematicians and scrum together a rough model of orbital mechanics in time for the next launch”. I don’t think this is a realistic model of how basic research works. Possibly this is a crux between our models?
3. The value of the rocket analogy is that it describes a concrete “way the world could be” with respect to AI. Once this is added to the set of hypotheses under consideration, the important thing is to try to assess the evidence for which possible world we’re in. “I choose to act as though this other hypothesis is true because it’s what I know” should set off alarm bells in that context, as should any impulse to take the side of Team Don’t-Try-To-Understand-Calculus in the contrived fictional example, because this suggests that your models and choices might be insensitive to whether you’re actually in the kind of world where you’re missing an important tool like calculus.
It’s 100% fine to disagree about whether we are in fact in that world, but any indication that we should unconditionally act as though we’re not in that world—e.g., for reasons other than Bayesian evidence about our environment, or for reasons so strong they’re insensitive even to things as important as “we trying to get to the Moon and we haven’t figured out calculus yet”—should set off major alarms.
Eliezer means the rocket analogy to illustrate his views on ‘how well do we understand AI alignment, and what kind of understanding is missing?‘, not ‘how big a deal is it if we mess up?’ AI systems aren’t rockets, so there’s no reason to extend the analogy further. (If we do want to compare flying machines and scientific-reasoning machines on this dimension, I’d call it relevant that flying organs have evolved many times in Nature, and never become globally dominant; whereas scientific-reasoning organs evolved just once, and took over the world very quickly.)
A relevant argument that’s nearby in conceptspace is ‘technologies are rarely that impactful, full stop; so we should have a strong prior that AGI won’t be that impactful either’.
Eliezer doesn’t mean to argue that we’ll get to AGI by pure brute force, just more brute force than is needed for safety / robustness / precise targeting. “Build a system that’s really good at scientific reasoning, and only solves the kinds of problems we want it to” is a much more constrained problem than “Build a system that’s really good at scientific reasoning”, and it’s generally hard to achieve much robustness / predictability / deep understanding of very novel software, even when that software isn’t as complex or opaque as a deep net.
It sounds to me like key disagreements might include “how much better at science are the first AGI systems built for science likely to be, compared to humans (who weren’t evolved to do science at all, but accidented into being capable of such)?” and “how many developers are likely to have the insights and other resources needed to design/train/deploy AGI in the first few years?” Your view makes more sense in my head when I imagine a world where AGI yields smaller capability gains, and where there aren’t a bunch of major players who can all deploy AGI within a few years of each other.
The theoretical framework behind current AI research is essentially “here’s what we are regressing between, X and Y, or here’s some input data X, outputs in response Y, and a reward R”. % correct or biggest R is the objective. And for more complex reasons that I’m going to compress here, you also care about the distribution of the responses.
This is something we can run with. We can iteratively deploy an overall framework—a massive AI platform that is supported by a consortium of companies and offers the best and most consistent performance—that supports ever more sophisticated agent architectures. That is, at first, supported architectures are for problems where the feedback is immediate and the environment the system is operating in is very markovian and clean of dirt, and later we will be able to solve more abstract problems.
With this basic idea we can replace most current jobs on earth and develop fully autonomous manufacturing, resource gathering, construction,
Automating scientific research—there’s a way to extend this kind of platform to design experiments autonomously. Essentially you build upon a lower level predictive model by predicting the outcomes of composite experiments that use multiple phenomena at once, and you conduct more experiments where the variance is high. It’s difficult to explain and I don’t have it fully mapped out, but I think developing a systematic model for how macroscale mechanical physical systems work could be done autonomously. And then the same idea scaled to how low level subatomic systems works, and to iteratively engineer nanotechnology, and maybe work through cell biology a similar way.
Umm, maybe big picture will explain it better : you have hundred story + megaliths of robotic test cells, where the robotic cells were made in an automated factory. And for cracking problems like nanotechnology or cell bio, each test cell is conducting an experiment at some level of integration to address unreliable parts. For example, if you have nanoscale gears and motors working well, but not switches, each test cell is exhaustively searching possible variants of a switch—not the entire grid, but using search trees to guess where a successful switch design might be—to get that piece to work.
And you have a simulator—a system using both learnable weights and some structure—that predict the switch designs that didn’t work. You feed into the simulator the error between what it predicted would happen and what the actual robotic test waldos are finding in reality. This update to the simulation model makes the overall effort more likely to design the next piece of the long process to developing nanoscale self replicating factories more probable to succeed.
And a mix of human scientists/engineer and scripts that call on machine learning models decide what to do next once a particular piece of the problem is reliably solved.
There are humans involved, it would not be a hands off system, and the robotic system operating in each test cell uses a well known and rigidly designed architecture that can be understood, even if you don’t know how the details of each module function since they are weighted combinations of multiple machine learning algorithms, some of which were in turn developed by other algorithms.
I have a pet theory that even if you could build a self improving AI, you would need to give it access to such megaliths (a cube of modular rooms as wide on each side as it is tall, where each room was made in a factory and trucked onto the site and installed by robots) to generate the clean information needed to do the kinds of magical things we think superintelligent AIs could do.
Robotic systems are the way to get that information because each step they do is replicable. And you subtract what happens without intervention by the robotic arm from what happens when you do, giving you clean data that only has the intervention in it, plus whatever variance the system you are analyzing has inherently. I have a theory that things like nanotechnology, or the kind of real medicine that could reverse human biology age and turn off all possible tumors, or all the other things we know the laws of physics permit but we cannot yet do, can’t be found in a vacuum. If you could build an AI “deity” it couldn’t come up with this solution from just what humans have published (whether it be all scientific journals ever written or every written word and recorded image) because far too much uncertainty would remain. You still wouldn’t know, even with all information analyzed, exactly what arrangements of nanoscale gears will do in a vacuum chamber. Or what the optimal drug regimen to prevent Ms. Smith from developing another mycardial infarction was. You could probably get closer than humans ever have—but you would need to manipulate the environment to find out what you needed to do.
This is the concrete reason for my assessment that out of control AGI are probably not as big a risk as we think. If such machines can’t find the information needed to kill us all without systematically looking into this with a large amount of infrastructure, and the host hardware for such a system is specialized and not just freely available on unsecured systems on the internet, and we haven’t actually designed these systems with anything like self reflectance much less awareness, it seems pretty implausible.
But I could be wrong. Having a detailed model of how I think such things would really work, based upon my previous work with present day AI, doesn’t necessarily make me correct. But I certainly feel more correct.
I’m responding to this statement directly in this post. No, this isn’t how basic research works. But just because centuries of inertia cause basic research to be structured a certain way doesn’t mean it has to be that way, or that my original statement is wrong.
You could quick and dirty assemble a model using curve fitting that would approximately tell you the relationship between the position of the Moon in the sky and a rocket’s thrust vector. It wouldn’t need to be a complete theory of gravitation, that theory that was developed over centuries. And it would work : approximate models are very often good enough.