Thanks for the detailed response! It clarifies some of my concerns and I think we have a lot of agreement overall. I’m also going to go in near reverse order,
To a first approximation, compute_cost = size*speed. If AGI requires brain size, then the first to cross the finish line will likely be operating not greatly faster than the minimum speed, which is real-time. But this does not imply the agents learn at only real time speed, as learning is parallelizable across many agent instances. Regardless, noneof these considerations depend on whether the AGI is trained in a closed simbox or an open sim with access to the internet.
To me the time/cost issue with the simboxes you proposed is in the data you need to train the AGIs from within the sim to prevent information leakage. Unlike with current training, we can’t just give it the whole internet, as that will contain loads of information about humans, how ML works, that it is in a sim etc which would be very dangerous. Instead, we would need to recapitulate the entire *data generating process* within the sim, which is what would be expensive. Naively, the only way to do this would be to actually simulate a bunch of agents interacting with the sim world for a long time, which would be at minimum simulated-years for human-level data efficiency and much much longer for current DL. It is possible, I guess, to amortise this work and create one ‘master-sim’ which so that we can try various AGI designs which all share the same dataset, and this would be good experimentally to isolate the impact of architecture/objective vs dataset, but under the reward-proxy learning approach, a large factor in the success in alignment depends on the dataset, which would be very expensive to recreate in sim without information transfer from our reality.
Training current ML models is very fast because they can use all the datasets already generated by human civilisation. To bootstrap to similar levels of intelligence in a sim without wholesale transfer of information from our reality, will require a concomitant amount of computational effort more like simulating our civilisation than simulating a single agent.
The ideal baseline cost of simboxing is only O(N+1) vs O(N) without—once good AGI designs are found, the simboxing approach requires only one additional unboxed training run (compared to never using simboxes). We can estimate this additional cost: it will be around or less than 1e25 ops (1e16 ops/s for brain-size model * 1e9s seconds for 30 years equivalent), or less than $10 million dollars (300 gpu years) using only todays gpus, ie nearly nothing
I don’t understand this. Presumably we will want to run a lot of training runs in the sim since we will probably need to iterate a considerable number of times to actually succeed in training a safe AGI. We will also want to test across a large range of datasets and initial conditions, which will necessitate the collection of a number of large and expensive sim-specific datasets here. It is probably also necessary to simulate reasonable sim populations as well, which will also increase the cost.
But let’s suppose there still is significant optimization slack, then in a sense you’ve almost answered your own question . .. we can easily incorporate new algorithmic advances into new simboxes or even upgrade agents mid-sim using magic potions or what not.
Perhaps I’m missing something here but I don’t understand how this is supposed to work. The whole point of the simbox is that there is no information leakage about our reality. Having AGI agents doing ML research in a reality which is close enough to our own that its insights transfer to our reality defeats the whole point of having a sim, which is preventing information leakage about our reality! On the other hand, if we invent some magical alternative to the intelligence explosion, then us the simulators won’t necessarily be able to invent the new ML techniques that are ‘invented’ in the sim.
Secondly, the algorithms of intelligence are much simpler than we expected, and brains already implement highly efficient or even near pareto-optimal approximations of the ideal universal learning algorithms.
To the extent either of those major points are true, rapid FOOM is much less likely; to the extent both are true (as they appear to be), then very rapid FOOM is very unlikely.
I agree that FOOM is very unlikely from the view of the current scaling laws, which imply a strongly sublinear returns on investment. The key unknown quantity at this point is the returns on ‘cognitive self improvement’ as opposed to just scaling in terms of parameters and data. We have never truly measured this as we haven’t yet developed appreciably self-modifying and self-improving ML systems. On the outside view, power-law diminishing returns are probably likely in this domain as well but we just don’t know.
Similarly, I agree that if contemporary ML is already on its asymptotically optimal scaling regime—i.e. if it is a fundamental constraint of the universe that intelligence can do no better than power law scaling (albeit with potentially much better coefficients than now), then FOOM is essentially impossible and I think that some form of humanity stands a pretty reasonable chance of survival. There is some evidence that ML is in the same power-law scaling regime as biological brains as well as a lot of algorithms from statistics, but I don’t think the evidence is conclusively against the possibility of a radically better paradigm which perhaps both us and evolution haven’t found. Potentially because it requires some precise combination of both highly parallel brain and a fast serial CPU-like processor which couldn’t be built by evolution with biological components. Personally, and it would be great if you convince me otherwise, that there are a lot of unknown unknowns in this space and the evidence from current ML and neuroscience isn’t that strong against there being unknown and better alternatives that could lead to FOOM. Ideally, we would understand the origins of scaling laws well enough we could figure out computational complexity bounds on the general capabilities of learning agents.
But even without rapid FOOM, we still can have disaster—for example consider the scenario of world domination by a clan of early uploads of some selfish/evil dictator or trillionaire. There’s still great value in solving alignment here, and (to my eyes at least) much less work focused on that area.
Yes of course, solving alignment in this regime is extremely valuable. With any luck, reality will be such that we will end up in this regime and I think alignment is actually solvable here while I’m very pessimistic in a full FOOM scenario. Indeed, I think we should spend a lot of effort in figuring out if FOOM is even possible and if it is trying to figure out how to stop the agents we build from FOOMing since this scenario is where a large amount of p(doom) is coming from.
Assume there was 1.) large algorithmic slack, and 2.) some other approach that was both viable and significantly different, then it would have to:
not use adequate testing of alignment (ie simboxes)
or not optimize for product of intelligence potential and measurable alignment/altruism
If there is enough algorithmic slack such that FOOM is likely, then I think that our capabilities to simulate such an event in simboxes will be highly limited and so we should focus much more on designing general safe objectives which, ideally, we can mathematically show can scale over huge capability gaps, if such safe objectives exist at all. We should also spend a lot of effort into figuring out how to constrain AGIs such that they don’t want to or can’t FOOM. I completely agree though that in general we should spend a lot of effort in building simboxes and measurably testing for alignment before deploying anything.
To me the time/cost issue with the simboxes you proposed is in the data you need to train the AGIs from within the sim to prevent information leakage. Unlike with current training, we can’t just give it the whole internet, as that will contain loads of information about humans, how ML works, that it is in a sim etc which would be very dangerous. Instead, we would need to recapitulate the entire data generating process within the sim, which is what would be expensive.
I’m not quite sure what you mean by data generating process, but the training cost is no different for a tightly constrained run vs an unconstrained run. An unconstrained run would involve something like a current human development process, where after say 5 years or whatever of basic sensory/motor grounding experience they are learning language then on the internet. A constrained run is exactly the same but for a much earlier historical time, long before the internet. The construction of the sim world to recreate the historical era is low cost in comparison to the AGI costs.
Naively, the only way to do this would be to actually simulate a bunch of agents interacting with the sim world for a long time, which would be at minimum simulated-years for human-level data efficiency and much much longer for current DL.
I’m expecting AGI will require the equivalence of say 20 years of experience, which we can compress about 100x through parallelization rather than serial speedup, basically just like in current DL systems. Consider VPT for example, which reaches expert human level in minecraft after training on the equivalent of 10 years of human minecraft experience.
It is possible, I guess, to amortise this work and create one ‘master-sim’ which so that we can try various AGI designs which all share the same dataset, and this would be good experimentally to isolate the impact of architecture/objective vs dataset, but under the reward-proxy learning approach, a large factor in the success in alignment depends on the dataset, which would be very expensive to recreate in sim without information transfer from our reality (well constructed open world RGPs already largely do this module obvious easter eggs, and they aren’t even trying very hard).
I’m not really sure what you mean by ‘dataset’ here, as there isn’t really a dataset other than the agents lifetime experiences in the world, procedurally generated by the sim. Like I said in the article, the simplest early simboxes don’t need to be much more complex than minecraft, but obviously it gets more interesting when you have a more rich, detailed fantasy world with it’s own history and books, magic system, etc. None of this is difficult to create now, and is only getting easier and cheaper. The safety constraint is not zero information transfer at all , as that wouldn’t even permit a sim, the constraint is to filter out new modern knowledge or anything that is out of character for the sim world coherency .
We want to use multiple worlds and scenarios to gain diversity and robustness, but again that isn’t so difficult or costly.
The ideal baseline cost of simboxing is only O(N+1) vs O(N) without
I don’t understand this. Presumably we will want to run a lot of training runs in the sim since we will probably need to iterate a considerable number of times to actually succeed in training a safe AGI.
Completely forget LLMs, just temporarily erase them from your mind for a moment. There is an obvious path to AGI—deepmind’s path—which consists of reverse engineering the brain, testing new architectures in ever more complex sim environments. Starting first with Atari, now moving on to minecraft, recapitulating video game’s march of moore’s law progress. This path is already naturally using simboxes and thus safe. So in this framework, let’s say it requires N training experiments to nail AGI (where each experiment trains a single shared model for around human-level age but parallelizing over a hundred to a thousand agents, as is done today). Then using simboxes is just a matter of never training the AGI in an unsafe world until the last final training run, once the design is perfected. The cost is then ideally is just one additional training run.
So the only way that the additional cost of safe simboxing could be worse/larger than one additional training run is if there is some significant disadvantage to training in purely historical/fantasy sim worlds vs sci-fi/modern sim worlds.
But we have good reasons to believe there shouldn’t be any such disadvantage: the architecture of the human brain certainly hasn’t changed much in the last few thousand years, intelligence is very general, etc.
Having AGI agents doing ML research in a reality which is close enough to our own that its insights transfer to our reality defeats the whole point of having a sim, which is preventing information leakage about our reality!
No agents are doing ML research in the simboxes, I said agents (or architectures rather) that are determined to be reasonably safe/altruistic can ‘graduate’ to reality and help iterate.
There is some evidence that ML is in the same power-law scaling regime as biological brains as well as a lot of algorithms from statistics, but I don’t think the evidence is conclusively against the possibility of a radically better paradigm which perhaps both us and evolution haven’t found
I mostly agree with you about the foom and scaling regimes. However I do believe there is various work on learning theory which suggests some bounds on scaling laws, (just haven’t read that literature recently). For example there are some scenarios (based on the statistical assumptions you place on efficient circuit/data distributions) where standard linear SGD (based on normal assumptions) is asytomptically suboptimal compared to alternates like exponential/multiplicative GD if the normal assumption is wrong and the circuit distribution is actually log-normal. There was also a nice paper recently which categorized the taxonomy and hierarchy of all known learning algorithms that approximate ideal bayesian learning (need to refind).
Thanks for the detailed response! It clarifies some of my concerns and I think we have a lot of agreement overall. I’m also going to go in near reverse order,
To me the time/cost issue with the simboxes you proposed is in the data you need to train the AGIs from within the sim to prevent information leakage. Unlike with current training, we can’t just give it the whole internet, as that will contain loads of information about humans, how ML works, that it is in a sim etc which would be very dangerous. Instead, we would need to recapitulate the entire *data generating process* within the sim, which is what would be expensive. Naively, the only way to do this would be to actually simulate a bunch of agents interacting with the sim world for a long time, which would be at minimum simulated-years for human-level data efficiency and much much longer for current DL. It is possible, I guess, to amortise this work and create one ‘master-sim’ which so that we can try various AGI designs which all share the same dataset, and this would be good experimentally to isolate the impact of architecture/objective vs dataset, but under the reward-proxy learning approach, a large factor in the success in alignment depends on the dataset, which would be very expensive to recreate in sim without information transfer from our reality.
Training current ML models is very fast because they can use all the datasets already generated by human civilisation. To bootstrap to similar levels of intelligence in a sim without wholesale transfer of information from our reality, will require a concomitant amount of computational effort more like simulating our civilisation than simulating a single agent.
I don’t understand this. Presumably we will want to run a lot of training runs in the sim since we will probably need to iterate a considerable number of times to actually succeed in training a safe AGI. We will also want to test across a large range of datasets and initial conditions, which will necessitate the collection of a number of large and expensive sim-specific datasets here. It is probably also necessary to simulate reasonable sim populations as well, which will also increase the cost.
Perhaps I’m missing something here but I don’t understand how this is supposed to work. The whole point of the simbox is that there is no information leakage about our reality. Having AGI agents doing ML research in a reality which is close enough to our own that its insights transfer to our reality defeats the whole point of having a sim, which is preventing information leakage about our reality! On the other hand, if we invent some magical alternative to the intelligence explosion, then us the simulators won’t necessarily be able to invent the new ML techniques that are ‘invented’ in the sim.
I agree that FOOM is very unlikely from the view of the current scaling laws, which imply a strongly sublinear returns on investment. The key unknown quantity at this point is the returns on ‘cognitive self improvement’ as opposed to just scaling in terms of parameters and data. We have never truly measured this as we haven’t yet developed appreciably self-modifying and self-improving ML systems. On the outside view, power-law diminishing returns are probably likely in this domain as well but we just don’t know.
Similarly, I agree that if contemporary ML is already on its asymptotically optimal scaling regime—i.e. if it is a fundamental constraint of the universe that intelligence can do no better than power law scaling (albeit with potentially much better coefficients than now), then FOOM is essentially impossible and I think that some form of humanity stands a pretty reasonable chance of survival. There is some evidence that ML is in the same power-law scaling regime as biological brains as well as a lot of algorithms from statistics, but I don’t think the evidence is conclusively against the possibility of a radically better paradigm which perhaps both us and evolution haven’t found. Potentially because it requires some precise combination of both highly parallel brain and a fast serial CPU-like processor which couldn’t be built by evolution with biological components. Personally, and it would be great if you convince me otherwise, that there are a lot of unknown unknowns in this space and the evidence from current ML and neuroscience isn’t that strong against there being unknown and better alternatives that could lead to FOOM. Ideally, we would understand the origins of scaling laws well enough we could figure out computational complexity bounds on the general capabilities of learning agents.
Yes of course, solving alignment in this regime is extremely valuable. With any luck, reality will be such that we will end up in this regime and I think alignment is actually solvable here while I’m very pessimistic in a full FOOM scenario. Indeed, I think we should spend a lot of effort in figuring out if FOOM is even possible and if it is trying to figure out how to stop the agents we build from FOOMing since this scenario is where a large amount of p(doom) is coming from.
If there is enough algorithmic slack such that FOOM is likely, then I think that our capabilities to simulate such an event in simboxes will be highly limited and so we should focus much more on designing general safe objectives which, ideally, we can mathematically show can scale over huge capability gaps, if such safe objectives exist at all. We should also spend a lot of effort into figuring out how to constrain AGIs such that they don’t want to or can’t FOOM. I completely agree though that in general we should spend a lot of effort in building simboxes and measurably testing for alignment before deploying anything.
I’m not quite sure what you mean by data generating process, but the training cost is no different for a tightly constrained run vs an unconstrained run. An unconstrained run would involve something like a current human development process, where after say 5 years or whatever of basic sensory/motor grounding experience they are learning language then on the internet. A constrained run is exactly the same but for a much earlier historical time, long before the internet. The construction of the sim world to recreate the historical era is low cost in comparison to the AGI costs.
I’m expecting AGI will require the equivalence of say 20 years of experience, which we can compress about 100x through parallelization rather than serial speedup, basically just like in current DL systems. Consider VPT for example, which reaches expert human level in minecraft after training on the equivalent of 10 years of human minecraft experience.
I’m not really sure what you mean by ‘dataset’ here, as there isn’t really a dataset other than the agents lifetime experiences in the world, procedurally generated by the sim. Like I said in the article, the simplest early simboxes don’t need to be much more complex than minecraft, but obviously it gets more interesting when you have a more rich, detailed fantasy world with it’s own history and books, magic system, etc. None of this is difficult to create now, and is only getting easier and cheaper. The safety constraint is not zero information transfer at all , as that wouldn’t even permit a sim, the constraint is to filter out new modern knowledge or anything that is out of character for the sim world coherency .
We want to use multiple worlds and scenarios to gain diversity and robustness, but again that isn’t so difficult or costly.
Completely forget LLMs, just temporarily erase them from your mind for a moment. There is an obvious path to AGI—deepmind’s path—which consists of reverse engineering the brain, testing new architectures in ever more complex sim environments. Starting first with Atari, now moving on to minecraft, recapitulating video game’s march of moore’s law progress. This path is already naturally using simboxes and thus safe. So in this framework, let’s say it requires N training experiments to nail AGI (where each experiment trains a single shared model for around human-level age but parallelizing over a hundred to a thousand agents, as is done today). Then using simboxes is just a matter of never training the AGI in an unsafe world until the last final training run, once the design is perfected. The cost is then ideally is just one additional training run.
So the only way that the additional cost of safe simboxing could be worse/larger than one additional training run is if there is some significant disadvantage to training in purely historical/fantasy sim worlds vs sci-fi/modern sim worlds.
But we have good reasons to believe there shouldn’t be any such disadvantage: the architecture of the human brain certainly hasn’t changed much in the last few thousand years, intelligence is very general, etc.
No agents are doing ML research in the simboxes, I said agents (or architectures rather) that are determined to be reasonably safe/altruistic can ‘graduate’ to reality and help iterate.
I mostly agree with you about the foom and scaling regimes. However I do believe there is various work on learning theory which suggests some bounds on scaling laws, (just haven’t read that literature recently). For example there are some scenarios (based on the statistical assumptions you place on efficient circuit/data distributions) where standard linear SGD (based on normal assumptions) is asytomptically suboptimal compared to alternates like exponential/multiplicative GD if the normal assumption is wrong and the circuit distribution is actually log-normal. There was also a nice paper recently which categorized the taxonomy and hierarchy of all known learning algorithms that approximate ideal bayesian learning (need to refind).