Gunnar_Zarncke comments on [Link] Adversarially trained neural representations may already be as robust as corresponding biological neural representations

Gunnar_Zarncke 26 Jun 2022 23:05 UTC
3 points
I agree with your reasoning.
[Thus] a machine could develop concepts of “good”, “ethical”, and “utility-maximizing” that are just as robust as mine, if not more so.
A corollary would be that figuring out human value is not enough to make it safe. At least not if you look at the NN stack alone.