I largely disagree about the intrinsic motivation/reward function points. There is a lot of evidence that there is at least some amount of general intelligence which is independent of interest in particular fields/topics. Of course, if you have a high level of intelligence + interest then your dataset will be heavily oriented towards that topic and you will gain a lot of skill in it, but the underlying aptitude/intelligence can be factored out of this.
How exactly specific interests are encoded is a different and also super fascinating question! It definitely isn’t a pure ‘bit prediction’ intrinsic curiosity since different people seem to care a lot about different kinds of bits. It is at least somewhat affected by external culture / datasets but not entirely (people can often be interested in things against cultural pressure or often before they really know what their interest is). It doesn’t seem super influenced by external reward in a lot of cases. To some extent it ties in with intrinsic aptitude (people tend to be interested in things they are good at) but of course this is at least somewhat circular since people tend to get better at things they are interested in, ceteris paribus.
The hyperparameters is a good point. I was thinking about this largely as architectural changes but I think that I was wrong about this they are much more continuous and also potentially much more flexible genetically. This seems to be a better and more likely explanation for continuous IQ distributions than architecture directly. It would definitely be interesting to know how robust the brain is to these kinds of hyper parameter distributions (i.e over what range do people vary and is it systematic). In ML my understanding is that at large scale models are generally pretty robust to small hyper parameter variations (allowing people to get away with cargo culting hyperparams from other related papers instead of always sweeping themselves) although of course really bad hyperparams destroy performance. The brain may also be less stable due to some combination of recurrent dynamics/active data selection leading to positive or negative loops, as well as just more weird architectural hyper parameters leading to more interactions and ways for things to go wrong.
I largely disagree about the intrinsic motivation/reward function points. There is a lot of evidence that there is at least some amount of general intelligence which is independent of interest in particular fields/topics. Of course, if you have a high level of intelligence + interest then your dataset will be heavily oriented towards that topic and you will gain a lot of skill in it, but the underlying aptitude/intelligence can be factored out of this.
How exactly specific interests are encoded is a different and also super fascinating question! It definitely isn’t a pure ‘bit prediction’ intrinsic curiosity since different people seem to care a lot about different kinds of bits. It is at least somewhat affected by external culture / datasets but not entirely (people can often be interested in things against cultural pressure or often before they really know what their interest is). It doesn’t seem super influenced by external reward in a lot of cases. To some extent it ties in with intrinsic aptitude (people tend to be interested in things they are good at) but of course this is at least somewhat circular since people tend to get better at things they are interested in, ceteris paribus.
The hyperparameters is a good point. I was thinking about this largely as architectural changes but I think that I was wrong about this they are much more continuous and also potentially much more flexible genetically. This seems to be a better and more likely explanation for continuous IQ distributions than architecture directly. It would definitely be interesting to know how robust the brain is to these kinds of hyper parameter distributions (i.e over what range do people vary and is it systematic). In ML my understanding is that at large scale models are generally pretty robust to small hyper parameter variations (allowing people to get away with cargo culting hyperparams from other related papers instead of always sweeping themselves) although of course really bad hyperparams destroy performance. The brain may also be less stable due to some combination of recurrent dynamics/active data selection leading to positive or negative loops, as well as just more weird architectural hyper parameters leading to more interactions and ways for things to go wrong.