Shoshannah Tekofsky comments on Naive Hypotheses on AI Alignment

Shoshannah Tekofsky 3 Jul 2022 8:16 UTC
1 point
0
Thank you!

If they are not orthogonal then presumably prosociality and power are inversely related, which is worse?

In this case, I’m hoping intelligence and prosociality-that-is-robust-to-absolute-power would hopefully be a positive correlation. However, I struggle to think how this might actually be tested… My intuitions may be born from the Stanford Prison experiment, which I think has been refuted since. So maybe we don’t actually have as much data on prosociality in extreme circumstances as I initially intuited. I’m mostly reasoning this out now on the fly by zooming in on where my thoughts may have originally come from.

That said, it doesn’t very much matter how frequent robust prosociality traits are, as long as they do exist and can be recreated in AGI.

I’ll DM you my discord :)