GunZoR comments on Can someone explain to me why most researchers think alignment is probably something that is humanly tractable?

GunZoR 3 Sep 2022 16:39 UTC
4 points
0
But what is stopping any of those “general, agentic learning systems” in the class “aligned to human values” from going meta — at any time — about its values and picking different values to operate with? Is the hope to align the agent and then constantly monitor it to prevent deviancy? If so, why wouldn’t preventing deviancy by monitoring be practically impossible, given that we’re dealing with an agent that will supposedly be able to out-calculate us at every step?