Like Gwern said, the goal is to win. If EfficientZero gets superhuman data-efficiency by “cheating,” well it still got superhuman data-efficiency...
I think a relevant comparison here would be total cost. EfficientZero took 7 hours on 4 GPUs to master each particular Atari game. Equipment probably cost around $10,000 last I checked. How long is the lifespan of a GPU? Two years? OK, so that’s something like 20,000 hours of 4 GPU’s time for $10K, so $0.50 for one hour, so $3.50 for the training run? Eh, maybe it’s a bit more than that due to energy costs or something. But still it seems probable that it beats human wages.
For computational expenditure cost… I think 4 GPUs would be doing something like 10^14 FLOPS, which is one OOM less than the human brain?
All that being said, I do take your point that if matters if EfficientZero is getting results via an importantly different method than the human method, because that should give us a shred of hope that what EfficientZero is doing, while equally viable for this task, would be less viable for really important or dangerous tasks. However it could also be more viable. (For example, maybe the human method is “draw upon decades of pre-training and background knowledge. Also benefit from the fact that the game was literally designed for humans.” Plausibly, that’s our secret sauce. In which case EfficientZero is just… qualitatively smarter than us in some sense. Less dollars, less FLOPs, etc. Compensates for the lack of secret sauce by thinking really hard about its experiences, in a way we simply can’t because our brains aren’t fast enough.)
Hmm, I think my comment came across as setting up a horse-race between EfficientZero and human brains, in a way that I didn’t intend. Sorry for bad choice of words. In particular, when I wrote “how AI compares to human brains”, I meant in the sense of “In what ways are they similar vs different? What are their relative strengths and weaknesses? Etc.”, but I guess it sounded like I was saying “human brain algorithms are better and EfficientZero is worse”. Sorry.
I could write a “human brain algorithms are fundamentally more powerful than EfficientZero” argument, but I wasn’t trying to, and such an argument sure as heck wouldn’t fit in a comment. :-)
If EfficientZero gets superhuman data-efficiency by “cheating,” well it still got superhuman data-efficiency...
Sure. If Atari sample efficiency is what we ultimately care about, then the results speak for themselves. For my part, I was using sample efficiency as a hint about other topics that are not themselves sample efficiency. For example, I think that if somebody wants to understand AlphaZero, the fact that it trained on 40,000,000 games of self-play is a highly relevant and interesting datapoint. Suppose you were to then say “…but of those 40,000,000 games, fundamentally it really only needed 100 games with the external simulator to learn the rules. The other 39,999,900 games might as well have been ‘in its head’. This was proven in follow-up work.”. I would reply: “Oh. OK. That’s interesting too. But I still care about the 40,000,000 number. I still see that number as a very important part of understanding the nature of AlphaZero and similar systems.”
One thing to note is that you don’t know how many games humans are playing in their head in some sense. We don’t have access to that kind of information about our own algorithms. Even if you think we don’t because we don’t consciously experience/remember them, that’s obviously wrong. Every time you have a thought pop out of nowhere or an eureka! moment from the incubation effect, or every time you have a Tetris effect dream (or all the experience-replay hippocampus neuroscience), you see how it feels to have powerful subconscious algorithms churning away on difficult problems without you having any awareness of it: nothing. But they still take wallclock years to reach levels of performance that something like AlphaZero does in hours...
That’s an interesting thought. My hunch is that hippocampal replay can’t happen unconsciously because if the hippocampus broadcasts a memory at all, it broadcasts it broadly to the cortex including GNW. That’s just my current opinion, I’m not sure if there’s neuroscience consensus on that question.
Here I’m sneaking in an assumption that “activity in the GNW” = “activity that you’re conscious of”. Edge-cases include times when there’s stuff happening in the GNW, but it’s not remembered after the fact (at least, not as a first-person episodic memory). Are you “conscious” during a dream that you forget afterwards? Are you “conscious” when you’re ‘blacked out’ from drinking too much? I guess I’d say “yes” to both, but that’s a philosophy question, or maybe just terminology.
If we want more reasons that human-vs-EfficientZero comparisons are not straightforward, there’s also the obvious fact that humans benefit from transfer-learning whereas EfficientZero starts with random weights.
AZ was a population of agents reinventing chess/go from scratch, so on a #game comparison basis is more similar to all human expert/pro games over history.
But yeah, EZ is basically a superspeed GPU atari simulator combined with MCTS.
A single 3090 can do 250 TF ideally with tensorcores, and A100 is 2x that, so 4 GPUs is > 10^15 flops theoretical. I’d also argue the brain is closer to 10^14, but this comparison is all kinda muck because they are so different. And as of today the GPU only hits those numbers on big dense matrix codes, where the brain is more fully sparse, so that’s probably another 2 to 4 OOM advantage for the brain.
Yes, thinking hard and fast in small simulated subspaces is a AGI/SIM superpower—related old post. But it’s still technically quantitative?
Like Gwern said, the goal is to win. If EfficientZero gets superhuman data-efficiency by “cheating,” well it still got superhuman data-efficiency...
I think a relevant comparison here would be total cost. EfficientZero took 7 hours on 4 GPUs to master each particular Atari game. Equipment probably cost around $10,000 last I checked. How long is the lifespan of a GPU? Two years? OK, so that’s something like 20,000 hours of 4 GPU’s time for $10K, so $0.50 for one hour, so $3.50 for the training run? Eh, maybe it’s a bit more than that due to energy costs or something. But still it seems probable that it beats human wages.
For computational expenditure cost… I think 4 GPUs would be doing something like 10^14 FLOPS, which is one OOM less than the human brain?
All that being said, I do take your point that if matters if EfficientZero is getting results via an importantly different method than the human method, because that should give us a shred of hope that what EfficientZero is doing, while equally viable for this task, would be less viable for really important or dangerous tasks. However it could also be more viable. (For example, maybe the human method is “draw upon decades of pre-training and background knowledge. Also benefit from the fact that the game was literally designed for humans.” Plausibly, that’s our secret sauce. In which case EfficientZero is just… qualitatively smarter than us in some sense. Less dollars, less FLOPs, etc. Compensates for the lack of secret sauce by thinking really hard about its experiences, in a way we simply can’t because our brains aren’t fast enough.)
Hmm, I think my comment came across as setting up a horse-race between EfficientZero and human brains, in a way that I didn’t intend. Sorry for bad choice of words. In particular, when I wrote “how AI compares to human brains”, I meant in the sense of “In what ways are they similar vs different? What are their relative strengths and weaknesses? Etc.”, but I guess it sounded like I was saying “human brain algorithms are better and EfficientZero is worse”. Sorry.
I could write a “human brain algorithms are fundamentally more powerful than EfficientZero” argument, but I wasn’t trying to, and such an argument sure as heck wouldn’t fit in a comment. :-)
Sure. If Atari sample efficiency is what we ultimately care about, then the results speak for themselves. For my part, I was using sample efficiency as a hint about other topics that are not themselves sample efficiency. For example, I think that if somebody wants to understand AlphaZero, the fact that it trained on 40,000,000 games of self-play is a highly relevant and interesting datapoint. Suppose you were to then say “…but of those 40,000,000 games, fundamentally it really only needed 100 games with the external simulator to learn the rules. The other 39,999,900 games might as well have been ‘in its head’. This was proven in follow-up work.”. I would reply: “Oh. OK. That’s interesting too. But I still care about the 40,000,000 number. I still see that number as a very important part of understanding the nature of AlphaZero and similar systems.”
(I’m not sure we’re disagreeing about anything…)
One thing to note is that you don’t know how many games humans are playing in their head in some sense. We don’t have access to that kind of information about our own algorithms. Even if you think we don’t because we don’t consciously experience/remember them, that’s obviously wrong. Every time you have a thought pop out of nowhere or an eureka! moment from the incubation effect, or every time you have a Tetris effect dream (or all the experience-replay hippocampus neuroscience), you see how it feels to have powerful subconscious algorithms churning away on difficult problems without you having any awareness of it: nothing. But they still take wallclock years to reach levels of performance that something like AlphaZero does in hours...
That’s an interesting thought. My hunch is that hippocampal replay can’t happen unconsciously because if the hippocampus broadcasts a memory at all, it broadcasts it broadly to the cortex including GNW. That’s just my current opinion, I’m not sure if there’s neuroscience consensus on that question.
Here I’m sneaking in an assumption that “activity in the GNW” = “activity that you’re conscious of”. Edge-cases include times when there’s stuff happening in the GNW, but it’s not remembered after the fact (at least, not as a first-person episodic memory). Are you “conscious” during a dream that you forget afterwards? Are you “conscious” when you’re ‘blacked out’ from drinking too much? I guess I’d say “yes” to both, but that’s a philosophy question, or maybe just terminology.
If we want more reasons that human-vs-EfficientZero comparisons are not straightforward, there’s also the obvious fact that humans benefit from transfer-learning whereas EfficientZero starts with random weights.
It’s EfficientZero, EfficientNet is an entirely different model architecture in computer vision.
Oops, thanks, just fixed it.
AZ was a population of agents reinventing chess/go from scratch, so on a #game comparison basis is more similar to all human expert/pro games over history.
But yeah, EZ is basically a superspeed GPU atari simulator combined with MCTS.
A single 3090 can do 250 TF ideally with tensorcores, and A100 is 2x that, so 4 GPUs is > 10^15 flops theoretical. I’d also argue the brain is closer to 10^14, but this comparison is all kinda muck because they are so different. And as of today the GPU only hits those numbers on big dense matrix codes, where the brain is more fully sparse, so that’s probably another 2 to 4 OOM advantage for the brain.
Yes, thinking hard and fast in small simulated subspaces is a AGI/SIM superpower—related old post. But it’s still technically quantitative?