It is quite blatant on what it does but this is pretty much statistics hacking.
Like I said, there’s plenty of uncertainty in FLOP/s. Maybe it’s helpful if rephrase this as an invitation for everyone to make their own modifications to the model.
I would compare a model being trained to computations that a single brain does over its lifetime to configure itself (or only restrict to childhood).
Cotra’s lifetime anchor is 1029 FLOPs (so 4-5 OOMs above gradient descent). That’s still quite a chasm.
For brain evolution analog I would include the brain metabolism of the computer scientists developing the next version of the model too.
Do you mean including the CS brain activity towards the computed costs of training the model?
I would not hold against the CPU the inefficiency of the solar panels. Likewise I don’t see how it is fair to blame the brain on the inefficiency of the gut. In case we can blame the gut then we should compare how much the model causes its electricity supply to increase which for many is equal to 0.
If you’re asking yourself whether or not you want to automate a certain role, then a practical subquestion is how much you have to spend on maintenance/fuel (i.e., electricity or food)? Then, I do think the acknowledging the source of the fuel becomes important.
Yes, I think that GPT-1 turning to GPT-2 and GPT-3 is the thing that is analogous with building brains out of new combinations of dna. Having an instance of GPT-3 to hone its weights and a single brain cutting and forming its connections are comparable. When doing fermi-estimates getting the ballpark wrong is pretty fatal as it is in the core of the activity. With that much conceptual confusion going on I don’t care about the numbers. To claim that other are making mistakes and not surviving a cursory look does not bode well for convincingness. I don’t care to get lured by pretty graphs to think my ignorance is more informed than it is.
If I know that the researches looks at the data until they find a correlation with p>0.05 that they found someting is not really significant news. Similarly if you keeping changing your viewpoint until you find an angle where orderings seem to reverse its less convincing that this one viewpoint is the one that matters.
Economically I would be interested in ability to change electricity to sugar and sugar to electricity. But because the end product is not the same the processes are not nearly economically interchangable. Go a long way in this direction and you measure everything in dollars. But typically when we care to specify that we care about energy efficiency and not example time efficiency we are going for more dimensions and more considerations rather than less.
To set terminology so that if gas prices go up then the energy efficiency of everything that uses gas goes down does not seem handy to me.
Like I said, there’s plenty of uncertainty in FLOP/s. Maybe it’s helpful if rephrase this as an invitation for everyone to make their own modifications to the model.
Cotra’s lifetime anchor is 1029 FLOPs (so 4-5 OOMs above gradient descent). That’s still quite a chasm.
Do you mean including the CS brain activity towards the computed costs of training the model?
If you’re asking yourself whether or not you want to automate a certain role, then a practical subquestion is how much you have to spend on maintenance/fuel (i.e., electricity or food)? Then, I do think the acknowledging the source of the fuel becomes important.
Yes, I think that GPT-1 turning to GPT-2 and GPT-3 is the thing that is analogous with building brains out of new combinations of dna. Having an instance of GPT-3 to hone its weights and a single brain cutting and forming its connections are comparable. When doing fermi-estimates getting the ballpark wrong is pretty fatal as it is in the core of the activity. With that much conceptual confusion going on I don’t care about the numbers. To claim that other are making mistakes and not surviving a cursory look does not bode well for convincingness. I don’t care to get lured by pretty graphs to think my ignorance is more informed than it is.
If I know that the researches looks at the data until they find a correlation with p>0.05 that they found someting is not really significant news. Similarly if you keeping changing your viewpoint until you find an angle where orderings seem to reverse its less convincing that this one viewpoint is the one that matters.
Economically I would be interested in ability to change electricity to sugar and sugar to electricity. But because the end product is not the same the processes are not nearly economically interchangable. Go a long way in this direction and you measure everything in dollars. But typically when we care to specify that we care about energy efficiency and not example time efficiency we are going for more dimensions and more considerations rather than less.
To set terminology so that if gas prices go up then the energy efficiency of everything that uses gas goes down does not seem handy to me.