I have to be honest, I’m skeptical. If we study how human prosociality works, my expectation is that we learn enough to produce some toy models with some very simplistic pro-sociality, but this seems insufficient for generating an AI capable of navigating tough moral dilemmas; just situations sufficiently off-distribution. The reason why we want humans in the loop is not because they are vaguely pro-social but because of the ability of humans to handle novel situations.
———————————————-
Actually, I shouldn’t completely rule out the value of this research. I think that it’s failure will be an utterly boring result, but perhaps there is value in seeing how it fails concretely. That said, it’s unlikely to be worth the effort if it takes a lot of neuroscience research just to construct these examples of failure.
If I had to say where it fails, it fails to be robust to relative scale. One thing I notice in real life history is that it really requires relatively equivalent power levels, and without that, it goes wrong fast (human treatment of non-pet animals are a good example.)
But you could design training runs that include agents with very different amounts of compute and see how stable the pro-social behavior is. You could also try to determine how instinct parameters have to be tuned to keep the pro-social behavior stable.
I have to be honest, I’m skeptical. If we study how human prosociality works, my expectation is that we learn enough to produce some toy models with some very simplistic pro-sociality, but this seems insufficient for generating an AI capable of navigating tough moral dilemmas; just situations sufficiently off-distribution. The reason why we want humans in the loop is not because they are vaguely pro-social but because of the ability of humans to handle novel situations.
———————————————-
Actually, I shouldn’t completely rule out the value of this research. I think that it’s failure will be an utterly boring result, but perhaps there is value in seeing how it fails concretely. That said, it’s unlikely to be worth the effort if it takes a lot of neuroscience research just to construct these examples of failure.
If I had to say where it fails, it fails to be robust to relative scale. One thing I notice in real life history is that it really requires relatively equivalent power levels, and without that, it goes wrong fast (human treatment of non-pet animals are a good example.)
But you could design training runs that include agents with very different amounts of compute and see how stable the pro-social behavior is. You could also try to determine how instinct parameters have to be tuned to keep the pro-social behavior stable.