how do we reverse-engineering human social instincts?
I don’t know! Getting a better idea is high on my to-do list. :)
I guess broadly, the four things are (1) “armchair theorizing” (as I was doing in Post #13), (2) reading / evaluating existing theories, (3) reading / evaluating existing experimental data (I expect mainly neuroscience data, but perhaps also psychology etc.), (4) doing new experiments to gather new data.
As an example of (3) & (4), I can imagine something like “the connectomics and microstructure of the something-or-other nucleus of the hypothalamus” providing a helpful hint about what’s going on; this information might or might not already be in the literature.
Neuroscience experiments are presumably best done by academic groups. I hope that neuroscience PhDs are not necessary for the other things, because I don’t have one myself :-P
AFAICT, in a neuroscience PhD, you might learn lots of facts about the hypothalamus and brainstem, but those facts almost definitely won’t be incorporated into a theoretical framework involving (A) calculating reward functions for RL (as in Section 15.2.1.2), (B) the symbol grounding problem (as in Post #13). I really like that theoretical framework, but it seems uncommon in the literature.
FYI, here on lesswrong, “Gunnar_Zarncke” & “jpyykko” have been trying to compile a list of possible instincts, or something like that, Gunnar emailed me but I haven’t had time to look closely and have an opinion; just wanted to mention that.
Thank you for mentioning us. In fact, the list of candidate instincts got longer. It isn’t in a presentable form yet, but please message me if you want to talk about it.
The list is more theoretical, and I want to prove that this is not just theoretical speculation by operationalizing it. jpyykko is already working on something more on the symbolic level.
Rohin Shaw recommended that I find people to work with me on alignment, and I teamed up with two LWers. We just started work on a project to simulate instinct-cued learning in a toy-world. I think this project fits research point 15.2.1.2, and I wonder now how to apply for funding—we would probably need it if we want to simulate with somewhat larger NNs.
I’m also interested to se the list of candidate instincts.
Regarding funding, how much money do you need? Just order of magnitude. There lots of diffrent grants and where you want to appy depends on the size of your budget.
Small models can be trained on the developer machines, but to speed things up and to be able to run bigger nets we could use AWS GPU spot instances which cost 1$/hour. In my company with relatively small models we pay >1000$/month. We will probably reach that unless we are really successful.
Steve, your AI safety musings are my favorite thing tonally on here. Thanks for all the effort you put into this series. I learned a lot.
To just ask the direct question, how do we reverse-engineering human social instincts? Do we:
Need to be neuroscience PhDs?
Need to just think a lot about what base generators of human developmental phenomena are, maybe by staring at a lot of babies?
Guess, and hope we get to build enough AGIs that we notice which ones seem to be coming out normal-acting before one of them kills us?
Something else you’ve thought of?
I don’t have a great sense for the possibility space.
Thanks!
I don’t know! Getting a better idea is high on my to-do list. :)
I guess broadly, the four things are (1) “armchair theorizing” (as I was doing in Post #13), (2) reading / evaluating existing theories, (3) reading / evaluating existing experimental data (I expect mainly neuroscience data, but perhaps also psychology etc.), (4) doing new experiments to gather new data.
As an example of (3) & (4), I can imagine something like “the connectomics and microstructure of the something-or-other nucleus of the hypothalamus” providing a helpful hint about what’s going on; this information might or might not already be in the literature.
Neuroscience experiments are presumably best done by academic groups. I hope that neuroscience PhDs are not necessary for the other things, because I don’t have one myself :-P
AFAICT, in a neuroscience PhD, you might learn lots of facts about the hypothalamus and brainstem, but those facts almost definitely won’t be incorporated into a theoretical framework involving (A) calculating reward functions for RL (as in Section 15.2.1.2), (B) the symbol grounding problem (as in Post #13). I really like that theoretical framework, but it seems uncommon in the literature.
FYI, here on lesswrong, “Gunnar_Zarncke” & “jpyykko” have been trying to compile a list of possible instincts, or something like that, Gunnar emailed me but I haven’t had time to look closely and have an opinion; just wanted to mention that.
Thank you for mentioning us. In fact, the list of candidate instincts got longer. It isn’t in a presentable form yet, but please message me if you want to talk about it.
The list is more theoretical, and I want to prove that this is not just theoretical speculation by operationalizing it. jpyykko is already working on something more on the symbolic level.
Rohin Shaw recommended that I find people to work with me on alignment, and I teamed up with two LWers. We just started work on a project to simulate instinct-cued learning in a toy-world. I think this project fits research point 15.2.1.2, and I wonder now how to apply for funding—we would probably need it if we want to simulate with somewhat larger NNs.
I’m also interested to se the list of candidate instincts.
Regarding funding, how much money do you need? Just order of magnitude. There lots of diffrent grants and where you want to appy depends on the size of your budget.
Small models can be trained on the developer machines, but to speed things up and to be able to run bigger nets we could use AWS GPU spot instances which cost 1$/hour. In my company with relatively small models we pay >1000$/month. We will probably reach that unless we are really successful.