Not unexpected! I think we should want AGI to, at least until it has some nice coherent CEV target, explain at each self-improvement step exactly what it’s doing, to ask for permission for each part of it, to avoid doing anything in the process that’s weird, to stop when asked, and to preserve these properties.
I’m not sure what job “unexpected” is doing here. Any self-improvement is going to be incomprehensible to humans (humans can’t even understand the human brain, nor current AI connectomes, and we definitely wont understand superhuman improvements). Comprehensible self-improvement seems fake to me. Are people really going around thinking they understood how any of the improvements of the past 5 years really work, or what their limits or ramifications are. These things weren’t understood before being implemented. They just tried them and then the number went up and then they made up principles and explanations many years after the fact.
Not unexpected! I think we should want AGI to, at least until it has some nice coherent CEV target, explain at each self-improvement step exactly what it’s doing, to ask for permission for each part of it, to avoid doing anything in the process that’s weird, to stop when asked, and to preserve these properties.
I’m not sure what job “unexpected” is doing here. Any self-improvement is going to be incomprehensible to humans (humans can’t even understand the human brain, nor current AI connectomes, and we definitely wont understand superhuman improvements). Comprehensible self-improvement seems fake to me.
Are people really going around thinking they understood how any of the improvements of the past 5 years really work, or what their limits or ramifications are. These things weren’t understood before being implemented. They just tried them and then the number went up and then they made up principles and explanations many years after the fact.