The end of Moore’s Law is actually a series of incomplete barriers, each of which only allows an increasingly narrower computational design to scale past that barrier: dennard scaling blocked serial architectures (CPUs), next up the end of energy scaling will block von-neumman arch (GPUs/TPUs), allowing only neuromorphic/PIM to scale much further, then there is the final size scaling barrier for all reversible computation, and only exotic reversible/quantum computers scale past that.
Funnily enough, if this paragraph had appeared in the original text by way of explanation for what you meant by “The future of AGI is to become more like the brain, not less.” then I would not have objected. Sorry for the misunderstanding. I do still think we have some sort of disagreement about takeoff and timelines modelling, but maybe we don’t.
I shouldn’t have said laptop; I should have said whatever it was you said (GPUs etc.) I happen to also believe it could in principle be done on a laptop with the right knowledge (imagine God himself wrote the code) but I shouldn’t have opened that can of worms. I agree that for all practical purposes it may as well be impossible.
If I had to guess at the crux between your disagreement on timelines, I think you might disagree about the FOOM process itself, but not about energy as a taut constraint to the first human-level AGI (which you both seem to agree isn’t the case). Per Jacob’s model, if a FOOM requires the AGI to quickly become much much smarter than humans, that excess smartness will inherently come with a massive electrical cost, which will cap it out at O(10^9) human-brain-equivalents until it can substantially increase world energy output. This would serve to arrest FOOM at roughly human-civilization-scale collective intelligence, except with much better coordination abilities.
To me, this was a pretty significant update, as I was previously imagining FOOMing to not top out before it was way way past human civilization’s collective bio-compute.
What do you mean by “arrest FOOM?” I am quite confident that by the time the intelligence explosion starts winding down, it’ll be past the point of no return for humans. Maybe from the AIs perspective progress will have stagnated due to compute constraints, and further progress will happen only once they can design exotic new hardware, so subjectively it feels like aeons of stagnation. But I think that “human-civilization-scale collective intelligence except with much better coordination abilities” is vastly underselling it. It’s like saying caveman humans were roughly elephant-scale intelligences except with better coordination abilities. Or saying that SpaceX is “roughly equivalent to the average US high school 9th grade class, except with better coordination abilities.” Do you disagree with this?
I’m not entirely sure what you mean by “better coordination abilities”, but the primary difference between 9th graders and SpaceX employees is knowledge/training. The primary difference between elephants and caveman humans was the latter possessing language and thus technology/culture and beyond single-lifetime knowledge accumulation.
AGI instances of the same shared mind/model should obviously have a coordination advantage, as should those created by the same organization, but there are many organizations that may be creating AGI.
Even in a world where AGI is running on GPUs and is scale-out bound by energy use and fab output, it may be that a smaller number of larger-than-human minds trained on beyond-human experience have a strong advantage, and in general I’d expect those types of advantages to matter at least as much as ‘coordination abilities’.
I don’t have a precise definition in mind since I was parrotting Yonadav. My point was that SpaceX is way better than a random similarly-sized group of high schoolers in many many important ways, even though SpaceX consumes just as many calories/energy as the high schoolers, such that it’s totally misleading to describe them as “roughly equivalent except that SpaceX has a massive coordination advantage.” The only thing roughly equivalent about them is their energy consumption, which just goes to show energy consumption is not a useful metric here.
I totally agree that fewer, larger brains with experience advantages seem likely to outcompete many merely human-sized brains. In fact I think I agree with everything you said in this comment.
No, I agree that coordination is the ballgame, and there’s not huge practical difference there.
Entirely separately, if we were worried about a treacherous turn due to a system being way above our capabilities, this lowers the probability of that, because there are clearer signals associated with the increase in intelligence needed to out-scheme a team of careful humans. (Large compute + energy usage.) It’s not close to being a solution, but it does bound the tail of arbitrarily-pessimistic outcomes from small-scale projects suddenly FOOMing. It also introduces an additional moniterable real world effect (a spike in energy usage noticeable by energy regulation systems).
Ah, I see. I think 10^9 is not a meaningful number to be talking about; long before there are 10^9 brain-equivalents worth of compute going into AI, we’ll be past the point of no return. But if instead you are talking about an amount of compute large enough that energy companies should be able to detect it, then yeah this seems fairly plausible. Supercomputers can’t be hidden from energy companies as far as I know, and plausibly AGI will appear first in supercomputers, so plausibly wherever AGI appears, it’ll be known by some government that the project was underweigh at least.
I don’t think this meaningfully lowers the probability of treacherous turn due to a system being way above our capabilities though. That’s because I didn’t put much probability mass on secret-AGI-project-in-a-basement scenarios anyway. I guess if I had, then this would have updated me.
Crypto mining would be affected significantly as well, or potentially mostly instead of, total energy use: intelligence is valuable-computation-per-watt, changing v-c-p-w changes the valuable energy spend of computers that sit idle, so you’d expect projects bidding on this to overtake cryptocurrency mining as the best use of idle computers, whether that’s due to a single project buying up computers and power, or due to a cryptocurrency energy-wasting-farm suddenly finding something directly valuable to do with their machines (and in fact it is already the case that ML can pay more than crypto mining).
@jacob_cannell’s argument is simply that the brain has more to tell us about the structure of high-value-per-watt computation than expected by ai philosophers. It does not mean the brain is at the absolute limit of generalized algorithmic energy efficiency (aka the only possible generalized intelligence metric); it only means that the structure of physical limits on algorithmic energy efficiency must be obeyed by any intelligent system, and while there may be large asymptotic speedups from larger scale structure improvement, the local efficiency of the brain is nothing to shake a stick at.
Perhaps ASI could be done earlier by “wasting” energy on lower value-per-watt AI projects—and in fact, there’s no reason to believe otherwise from available research progress. All AI progress that has ever occurred, after all, has been on lower generalized value-per-watt compute substrate than human brains can provide, but in return for being on thermodynamically inefficiency computers, it gets benefits that can economically compete with humans—eg via algorithmic specialization, high precision math, or exact repeatability—and thereby, AI research makes progress towards ever-increasing value-of-compute-output-per-watt.
If a system is AGI, it means that it is within a constant factor of energy efficiency per watt of the human brain for nearly all tasks—potentially a large constant factor, but a constant factor nonetheless. If it’s just barely general superintelligence and is wildly inefficient at small scales, then the only possible way it could be superintelligence is because it scales (maybe just barely) better than the brain with problem difficulty—extracting asymptotically better value-per-watt than an equivalently scaled system of humans consuming the same number of watts, due to what must ground out to improved total-system-thermodynamic-efficiency-per-unit-useful-computation.
Your proposal seems to be that we should expect a large scale multi-agent AI system to be superintelligence in this larger-scale asymptotic respect, despite that the human brain has shockingly high interconnect-efficiency and basic thermal compute efficiency. I have no disagreement. What this does tell us is that deep learning doesn’t have a unique expected qualitative advantage nor expected qualitative disadvantage vs the brain. if it becomes able to find more energy-efficient energy routes through its processing substrate’s spacetime (ie more energy efficient algorithms) (ie more intelligent algorithms), then it wins. predicting when that will happen, which teams are close, and guaranteeing safety becomes the remaining issue: guaranteeing that the resulting system does not cause mass energy-structure-aka-data loss (eg, death, body damage, injury, memory loss, hdd corruption/erasure, failure to cryonically freeze as-yet-unrepairable beings, etc) nor interfere significantly with the values of living beings (torture, energy-budget squeeze, cryonic freezing of beings who wish to continue operating, etc).
(due to the cycles seen in evolutionary game theory, I suspect that an unsafe or bad-at-distributed-systems-fairness AGI mega-network will moderately quickly collapse with similar high-defection-rate issues to the human society we have; and if it exterminates and then succeeds humanity, I’d guess it will eventually evolve a large scale cooperative system again; but there’s no reason to believe it wouldn’t kill us first. friendly multi-agent systems are the hardest part of this whole thing, IMO.)
Great. Thanks.
Funnily enough, if this paragraph had appeared in the original text by way of explanation for what you meant by “The future of AGI is to become more like the brain, not less.” then I would not have objected. Sorry for the misunderstanding. I do still think we have some sort of disagreement about takeoff and timelines modelling, but maybe we don’t.
I shouldn’t have said laptop; I should have said whatever it was you said (GPUs etc.) I happen to also believe it could in principle be done on a laptop with the right knowledge (imagine God himself wrote the code) but I shouldn’t have opened that can of worms. I agree that for all practical purposes it may as well be impossible.
If I had to guess at the crux between your disagreement on timelines, I think you might disagree about the FOOM process itself, but not about energy as a taut constraint to the first human-level AGI (which you both seem to agree isn’t the case). Per Jacob’s model, if a FOOM requires the AGI to quickly become much much smarter than humans, that excess smartness will inherently come with a massive electrical cost, which will cap it out at O(10^9) human-brain-equivalents until it can substantially increase world energy output. This would serve to arrest FOOM at roughly human-civilization-scale collective intelligence, except with much better coordination abilities.
To me, this was a pretty significant update, as I was previously imagining FOOMing to not top out before it was way way past human civilization’s collective bio-compute.
What do you mean by “arrest FOOM?” I am quite confident that by the time the intelligence explosion starts winding down, it’ll be past the point of no return for humans. Maybe from the AIs perspective progress will have stagnated due to compute constraints, and further progress will happen only once they can design exotic new hardware, so subjectively it feels like aeons of stagnation. But I think that “human-civilization-scale collective intelligence except with much better coordination abilities” is vastly underselling it. It’s like saying caveman humans were roughly elephant-scale intelligences except with better coordination abilities. Or saying that SpaceX is “roughly equivalent to the average US high school 9th grade class, except with better coordination abilities.” Do you disagree with this?
I’m not entirely sure what you mean by “better coordination abilities”, but the primary difference between 9th graders and SpaceX employees is knowledge/training. The primary difference between elephants and caveman humans was the latter possessing language and thus technology/culture and beyond single-lifetime knowledge accumulation.
AGI instances of the same shared mind/model should obviously have a coordination advantage, as should those created by the same organization, but there are many organizations that may be creating AGI.
Even in a world where AGI is running on GPUs and is scale-out bound by energy use and fab output, it may be that a smaller number of larger-than-human minds trained on beyond-human experience have a strong advantage, and in general I’d expect those types of advantages to matter at least as much as ‘coordination abilities’.
I don’t have a precise definition in mind since I was parrotting Yonadav. My point was that SpaceX is way better than a random similarly-sized group of high schoolers in many many important ways, even though SpaceX consumes just as many calories/energy as the high schoolers, such that it’s totally misleading to describe them as “roughly equivalent except that SpaceX has a massive coordination advantage.” The only thing roughly equivalent about them is their energy consumption, which just goes to show energy consumption is not a useful metric here.
I totally agree that fewer, larger brains with experience advantages seem likely to outcompete many merely human-sized brains. In fact I think I agree with everything you said in this comment.
No, I agree that coordination is the ballgame, and there’s not huge practical difference there. Entirely separately, if we were worried about a treacherous turn due to a system being way above our capabilities, this lowers the probability of that, because there are clearer signals associated with the increase in intelligence needed to out-scheme a team of careful humans. (Large compute + energy usage.) It’s not close to being a solution, but it does bound the tail of arbitrarily-pessimistic outcomes from small-scale projects suddenly FOOMing. It also introduces an additional moniterable real world effect (a spike in energy usage noticeable by energy regulation systems).
Ah, I see. I think 10^9 is not a meaningful number to be talking about; long before there are 10^9 brain-equivalents worth of compute going into AI, we’ll be past the point of no return. But if instead you are talking about an amount of compute large enough that energy companies should be able to detect it, then yeah this seems fairly plausible. Supercomputers can’t be hidden from energy companies as far as I know, and plausibly AGI will appear first in supercomputers, so plausibly wherever AGI appears, it’ll be known by some government that the project was underweigh at least.
I don’t think this meaningfully lowers the probability of treacherous turn due to a system being way above our capabilities though. That’s because I didn’t put much probability mass on secret-AGI-project-in-a-basement scenarios anyway. I guess if I had, then this would have updated me.
Crypto mining would be affected significantly as well, or potentially mostly instead of, total energy use: intelligence is valuable-computation-per-watt, changing v-c-p-w changes the valuable energy spend of computers that sit idle, so you’d expect projects bidding on this to overtake cryptocurrency mining as the best use of idle computers, whether that’s due to a single project buying up computers and power, or due to a cryptocurrency energy-wasting-farm suddenly finding something directly valuable to do with their machines (and in fact it is already the case that ML can pay more than crypto mining).
@jacob_cannell’s argument is simply that the brain has more to tell us about the structure of high-value-per-watt computation than expected by ai philosophers. It does not mean the brain is at the absolute limit of generalized algorithmic energy efficiency (aka the only possible generalized intelligence metric); it only means that the structure of physical limits on algorithmic energy efficiency must be obeyed by any intelligent system, and while there may be large asymptotic speedups from larger scale structure improvement, the local efficiency of the brain is nothing to shake a stick at.
Perhaps ASI could be done earlier by “wasting” energy on lower value-per-watt AI projects—and in fact, there’s no reason to believe otherwise from available research progress. All AI progress that has ever occurred, after all, has been on lower generalized value-per-watt compute substrate than human brains can provide, but in return for being on thermodynamically inefficiency computers, it gets benefits that can economically compete with humans—eg via algorithmic specialization, high precision math, or exact repeatability—and thereby, AI research makes progress towards ever-increasing value-of-compute-output-per-watt.
If a system is AGI, it means that it is within a constant factor of energy efficiency per watt of the human brain for nearly all tasks—potentially a large constant factor, but a constant factor nonetheless. If it’s just barely general superintelligence and is wildly inefficient at small scales, then the only possible way it could be superintelligence is because it scales (maybe just barely) better than the brain with problem difficulty—extracting asymptotically better value-per-watt than an equivalently scaled system of humans consuming the same number of watts, due to what must ground out to improved total-system-thermodynamic-efficiency-per-unit-useful-computation.
Your proposal seems to be that we should expect a large scale multi-agent AI system to be superintelligence in this larger-scale asymptotic respect, despite that the human brain has shockingly high interconnect-efficiency and basic thermal compute efficiency. I have no disagreement. What this does tell us is that deep learning doesn’t have a unique expected qualitative advantage nor expected qualitative disadvantage vs the brain. if it becomes able to find more energy-efficient energy routes through its processing substrate’s spacetime (ie more energy efficient algorithms) (ie more intelligent algorithms), then it wins. predicting when that will happen, which teams are close, and guaranteeing safety becomes the remaining issue: guaranteeing that the resulting system does not cause mass energy-structure-aka-data loss (eg, death, body damage, injury, memory loss, hdd corruption/erasure, failure to cryonically freeze as-yet-unrepairable beings, etc) nor interfere significantly with the values of living beings (torture, energy-budget squeeze, cryonic freezing of beings who wish to continue operating, etc).
(due to the cycles seen in evolutionary game theory, I suspect that an unsafe or bad-at-distributed-systems-fairness AGI mega-network will moderately quickly collapse with similar high-defection-rate issues to the human society we have; and if it exterminates and then succeeds humanity, I’d guess it will eventually evolve a large scale cooperative system again; but there’s no reason to believe it wouldn’t kill us first. friendly multi-agent systems are the hardest part of this whole thing, IMO.)