12.4.4 No easy guarantees about what we’ll get with social-instinct AGIs
Good news! Our AGI is inside the human distribution in every respect. Therefore, we can look at humans and their behavior, and absolutely everything we see will also apply to the AGI.
Let’s try to understand exactly how innate social instincts combine with life experience (a.k.a. training data) to form human moral intuitions
If the AGI is not in the human distribution in every respect (and it won’t be), then we need to develop the (more difficult) 2nd type of argument, not the 1st.
There could be a hybrid: While we can’t experiment with real humans, we can experiment with simulations of agents and observe the distribution under varying steering system parameters. It seems plausible that we can then tune the parameters to sufficiently limit the out-of-distribution behavior (or see that it’s not possible with the number of parameters).
About
There could be a hybrid: While we can’t experiment with real humans, we can experiment with simulations of agents and observe the distribution under varying steering system parameters. It seems plausible that we can then tune the parameters to sufficiently limit the out-of-distribution behavior (or see that it’s not possible with the number of parameters).