Practical AGI will necessarily take the form of populations of agents due to computational constraints (costly memory transactions need to be amortized, parallel scaling constraints, etc). This is true today where you need to run at least around 100 AI instances on a single GPU at once to get good performance. This will remain true into the future—its a hard constraint from the physics of fast hardware.
I don’t know about this, but would be happy to hear more.
Superintelligence can imply size (a big civilization), speed, or quality. All of this is under our control. We can test a smaller population in the sandbox, we can run them at manageable speed, and we control their knowledge.
I don’t think the point is “controlling” these properties, I think the point is drawing conclusions about what an AI will do in the real world. Reduced speed might allow us to run “fast AIs” in simulation and draw conclusions about what they’ll do. Reduced speed might also let us run AI civilizations of large size (though it’s not obvious to me why you’d want such a thing) and draw conclusions about what they’ll do. Reducing the AI’s knowledge seems like a way to make a simulation more compuationally tractable and therefore get better predictions about what the AI will do—but it seems like a risky way that can introduce bias into a simulation.
Sandboxing will test entire agent architectures—equivalent to DNA brain blueprints for humans—to determine if samples from those architectures have highly desirable mental properties such as altruism.
My real problem is that I don’t think just testing for altruism (which I assume means altruistic behavior) is remotely good enough. If we could simulate our world out past an AI becoming more powerful than the human race, and select for altruism then, I’d be happy. But I am pretty confident that there will be big problems generalizing from a simulation to reality, if that simulation has both differences and restrictions on possible actions and possible values.
If we’re just testing a self-driving car, we can make a simulation that captures the available actions (both literal outputs and “effective actions” permitted by the dynamics) and has basically the right value function built in from the start. Additionally, self-driving cars generalize well from the model to reality. Suppose you have something unrealistic in the model (say, you have other cars follow set training trajectories rather than reacting to the actions of the car). A realistic self-driving car that does well in the simulation might be bad at some skills like negotiating for space on the road, but it won’t suddenly, say, try to use its tire tracks to spell out letters if it you put it into reality with humans.
To put what I think concretely, when exposed to a difference between training and reality, a “dumb, parametric AI” projects reality onto the space it learned in training and just keeps on plugging, making it somewhat insensitive to reality being complicated, and giving us a better idea about how it will generalize. But a “smart AI” doesn’t seem to have this property, it will learn the complications of reality that were omitted in testing, and can act very different as a result. This goes back to the problem of expanding sets of effective actions.
I don’t know about this, but would be happy to hear more.
I don’t think the point is “controlling” these properties, I think the point is drawing conclusions about what an AI will do in the real world. Reduced speed might allow us to run “fast AIs” in simulation and draw conclusions about what they’ll do. Reduced speed might also let us run AI civilizations of large size (though it’s not obvious to me why you’d want such a thing) and draw conclusions about what they’ll do. Reducing the AI’s knowledge seems like a way to make a simulation more compuationally tractable and therefore get better predictions about what the AI will do—but it seems like a risky way that can introduce bias into a simulation.
My real problem is that I don’t think just testing for altruism (which I assume means altruistic behavior) is remotely good enough. If we could simulate our world out past an AI becoming more powerful than the human race, and select for altruism then, I’d be happy. But I am pretty confident that there will be big problems generalizing from a simulation to reality, if that simulation has both differences and restrictions on possible actions and possible values.
If we’re just testing a self-driving car, we can make a simulation that captures the available actions (both literal outputs and “effective actions” permitted by the dynamics) and has basically the right value function built in from the start. Additionally, self-driving cars generalize well from the model to reality. Suppose you have something unrealistic in the model (say, you have other cars follow set training trajectories rather than reacting to the actions of the car). A realistic self-driving car that does well in the simulation might be bad at some skills like negotiating for space on the road, but it won’t suddenly, say, try to use its tire tracks to spell out letters if it you put it into reality with humans.
To put what I think concretely, when exposed to a difference between training and reality, a “dumb, parametric AI” projects reality onto the space it learned in training and just keeps on plugging, making it somewhat insensitive to reality being complicated, and giving us a better idea about how it will generalize. But a “smart AI” doesn’t seem to have this property, it will learn the complications of reality that were omitted in testing, and can act very different as a result. This goes back to the problem of expanding sets of effective actions.