If I had to say where it fails, it fails to be robust to relative scale. One thing I notice in real life history is that it really requires relatively equivalent power levels, and without that, it goes wrong fast (human treatment of non-pet animals are a good example.)
But you could design training runs that include agents with very different amounts of compute and see how stable the pro-social behavior is. You could also try to determine how instinct parameters have to be tuned to keep the pro-social behavior stable.
If I had to say where it fails, it fails to be robust to relative scale. One thing I notice in real life history is that it really requires relatively equivalent power levels, and without that, it goes wrong fast (human treatment of non-pet animals are a good example.)
But you could design training runs that include agents with very different amounts of compute and see how stable the pro-social behavior is. You could also try to determine how instinct parameters have to be tuned to keep the pro-social behavior stable.