Update: According to this the human brain actually is getting ~10^7 bits of data every second, although the highest level conscious awareness is only processing ~50. So insofar as we go with the “tokens” definition, it does seem that the human brain is processing plenty of tokens for its parameter count -- 10^16, in fact, over the course of its lifetime. More than enough! And insofar as we go with the “single pass through the network” definition, which would mean we are looking for about 10^12… then we get a small discrepancy; the maximum firing rate of neurons is 250 − 1000 times per second, which means 10^11.5 or so… actually this more or less checks out I’d say. Assuming it’s the max rate that matters and not the average rate (the average rate is about once per second).
Does this mean that it may not actually be true that humans are several OOMs more data-efficient than ANNs? Maybe the apparent data-efficiency advantage is really mostly just the result of transfer learning from vast previous life experience, just as GPT-3 can “few-shot learn” totally new tasks, and also “fine-tune” on relatively small amounts of data (3+ OOMs less, according to the transfer laws paper!) but really what’s going on is just transfer learning from its vast pre-training experience.
Update: According to this the human brain actually is getting ~10^7 bits of data every second, although the highest level conscious awareness is only processing ~50. So insofar as we go with the “tokens” definition, it does seem that the human brain is processing plenty of tokens for its parameter count -- 10^16, in fact, over the course of its lifetime. More than enough! And insofar as we go with the “single pass through the network” definition, which would mean we are looking for about 10^12… then we get a small discrepancy; the maximum firing rate of neurons is 250 − 1000 times per second, which means 10^11.5 or so… actually this more or less checks out I’d say. Assuming it’s the max rate that matters and not the average rate (the average rate is about once per second).
Does this mean that it may not actually be true that humans are several OOMs more data-efficient than ANNs? Maybe the apparent data-efficiency advantage is really mostly just the result of transfer learning from vast previous life experience, just as GPT-3 can “few-shot learn” totally new tasks, and also “fine-tune” on relatively small amounts of data (3+ OOMs less, according to the transfer laws paper!) but really what’s going on is just transfer learning from its vast pre-training experience.