Wouldn’t this be similar to how a Neural Network “disregards” training data that it has already seen?
I don’t know how that’s done, sorry. Does it literally throw away the the data without using it for anything whatsoever (And does it do this with on the order of 99.9% of the training data set?)? Or does it process the data but then because it is redundant it has no or almost no effect on the model weights? I’m talking about the former, since the vast majority of our visual data never makes it from the retina to the optic nerve. The latter would be something more like how looking at my bedroom wall yet again has little to no effect on my understanding of any aspect of the world.
And to your second point, yeah I was pretty unclear, sorry. I meant, your original calculation was that a human at age 30 has ~31,728 T tokens worth of data, compared to 1T for GPT4. The human has 31728 times as much, and log (31728) is about 4.5, meaning the human has 4.5 OOMs more training data. But if I’m right that you should cut down your human training data amounts by ~1000x because of throwing it away before it gets processed in the brain at all, then we’re left with a human at age 30 having only 31.728x as much. log(31.728)~1.5, aka the human has 1.5 OOMs more training data. The rest of that comment was me indicating that that’s just how much data gets to the brain in any form, not how much is actually being processed for training purposes.
I don’t know how that’s done, sorry. Does it literally throw away the the data without using it for anything whatsoever (And does it do this with on the order of 99.9% of the training data set?)? Or does it process the data but then because it is redundant it has no or almost no effect on the model weights? I’m talking about the former, since the vast majority of our visual data never makes it from the retina to the optic nerve. The latter would be something more like how looking at my bedroom wall yet again has little to no effect on my understanding of any aspect of the world.
And to your second point, yeah I was pretty unclear, sorry. I meant, your original calculation was that a human at age 30 has ~31,728 T tokens worth of data, compared to 1T for GPT4. The human has 31728 times as much, and log (31728) is about 4.5, meaning the human has 4.5 OOMs more training data. But if I’m right that you should cut down your human training data amounts by ~1000x because of throwing it away before it gets processed in the brain at all, then we’re left with a human at age 30 having only 31.728x as much. log(31.728)~1.5, aka the human has 1.5 OOMs more training data. The rest of that comment was me indicating that that’s just how much data gets to the brain in any form, not how much is actually being processed for training purposes.