Rubi J. Hudson comments on Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural

Rubi J. Hudson 24 Jul 2024 7:58 UTC
LW: 1 AF: 1
0
AF
I don’t think we have the right tools to make an AI take actions that are low impact and reversible, but if we can develop them the plan as I see it would be to implement those properties to avoid manipulation in the short term and use that time to go from a corrigible AI to a fully aligned one.