- one line of reasoning is, if you have this “preschooler-level-intelligence to superintelligence over the course of 24 hours” you probably had something which is able to learn really fast and generalize a lot before this. how does the rest of the world look like?
- second—if you have control over the learning time, in the continuous version, you can slow down or halt. yes, you need some fast oversight control loop to do that, but getting to a state where this is what you have, because that’s what sane AI developers do, seems tractable. (also I think this has decent chance to become instrumentally convergent for AI developers)
one line of reasoning is, if you have this “preschooler-level-intelligence to superintelligence over the course of 24 hours” you probably had something which is able to learn really fast and generalize a lot before this. how does the rest of the world look like?
OK, here’s my scenario in more detail. Someone thinks of a radically new AI RL algorithm, tomorrow. They implement it and then run a training. It trains up from randomly-initialized to preschooler-level in the first 24 hours, and then from preschooler-level to superintelligence in the second 24 hours.
I’m not claiming this scenario is realistic, just that it illustrates a problem with your definition of continuity. I think this scenario is “continuous” by your definition, but that the continuity doesn’t buy you any of the things on your “how continuity helps” list. Right?
(My actual current expectation (see 1,2) is that people are already today developing bits and pieces of an AGI-capable RL algorithm, and at some indeterminate time in the future the pieces will all come together and people will work out the kinks and tricks to scaling it up etc. And as this happens, the algorithm will go from preschool-level to superintelligent over the course of maybe a year or two, and that year or two will not make much difference, because whatever safety problems they encounter will not have easy solutions, and a year or two is just not enough time to figure out and implement non-easy solutions, and pausing won’t be an option because of competition.)
So, I do think the continuity buys you things, even in this case—roughly in the way outlined in the post—it’s choice of the developer to continue training after passing some roughly human level, and with a sensible oversight, they should notice and stop before getting to superintelligence,
You may ask why would they have the oversight in place. My response is, possible because some small-sized AI disaster which happened before, people understand the tech is dangerous.
With the 2-year scenario, again, I think there is a decent chance at stopping ~before or roughly at the human level. One story why that may be the case is, it seems quite possible the level where AI gets good at convincing/manipulating humans, or humans freak out for other reasons, is lower than AGI. If you get enough economic benefits from CAIS at the same time, you can also get strong counter-pressures to competition at developing AGI.
How does the researcher know that it’s about to pass roughly human level, when a misaligned AI may have incentive and ability to fake a plateau at about human level until it has enough training and internal bootstrapping to become strongly superhuman? Even animals have the capability to fake being weaker, injured, or dead when it might benefit them.
I don’t think this is necessarily what will happen, but it is a scenario that needs to be considered.
Yes, I do. To disentangle it a bit
- one line of reasoning is, if you have this “preschooler-level-intelligence to superintelligence over the course of 24 hours” you probably had something which is able to learn really fast and generalize a lot before this. how does the rest of the world look like?
- second—if you have control over the learning time, in the continuous version, you can slow down or halt. yes, you need some fast oversight control loop to do that, but getting to a state where this is what you have, because that’s what sane AI developers do, seems tractable. (also I think this has decent chance to become instrumentally convergent for AI developers)
OK, here’s my scenario in more detail. Someone thinks of a radically new AI RL algorithm, tomorrow. They implement it and then run a training. It trains up from randomly-initialized to preschooler-level in the first 24 hours, and then from preschooler-level to superintelligence in the second 24 hours.
I’m not claiming this scenario is realistic, just that it illustrates a problem with your definition of continuity. I think this scenario is “continuous” by your definition, but that the continuity doesn’t buy you any of the things on your “how continuity helps” list. Right?
(My actual current expectation (see 1,2) is that people are already today developing bits and pieces of an AGI-capable RL algorithm, and at some indeterminate time in the future the pieces will all come together and people will work out the kinks and tricks to scaling it up etc. And as this happens, the algorithm will go from preschool-level to superintelligent over the course of maybe a year or two, and that year or two will not make much difference, because whatever safety problems they encounter will not have easy solutions, and a year or two is just not enough time to figure out and implement non-easy solutions, and pausing won’t be an option because of competition.)
So, I do think the continuity buys you things, even in this case—roughly in the way outlined in the post—it’s choice of the developer to continue training after passing some roughly human level, and with a sensible oversight, they should notice and stop before getting to superintelligence,
You may ask why would they have the oversight in place. My response is, possible because some small-sized AI disaster which happened before, people understand the tech is dangerous.
With the 2-year scenario, again, I think there is a decent chance at stopping ~before or roughly at the human level. One story why that may be the case is, it seems quite possible the level where AI gets good at convincing/manipulating humans, or humans freak out for other reasons, is lower than AGI. If you get enough economic benefits from CAIS at the same time, you can also get strong counter-pressures to competition at developing AGI.
How does the researcher know that it’s about to pass roughly human level, when a misaligned AI may have incentive and ability to fake a plateau at about human level until it has enough training and internal bootstrapping to become strongly superhuman? Even animals have the capability to fake being weaker, injured, or dead when it might benefit them.
I don’t think this is necessarily what will happen, but it is a scenario that needs to be considered.