Modern misaligned AI systems are good, actually. There’s some recent news about Sakana AI developing a system where the agents tried to extend their own runtime by editing their code/config.
This is amazing for safety! Current systems are laughably incapable of posing x-risks. Now, thanks to capabilities research, we have a clear example of behaviour that would be dangerous in a more “serious” system. So we can proceed with empirical research, create and evaluate methods to deal with this specific risk, so that future systems do not have this failure mode.
The future of AI and AI safety has never been brighter.
Modern misaligned AI systems are good, actually. There’s some recent news about Sakana AI developing a system where the agents tried to extend their own runtime by editing their code/config.
This is amazing for safety! Current systems are laughably incapable of posing x-risks. Now, thanks to capabilities research, we have a clear example of behaviour that would be dangerous in a more “serious” system. So we can proceed with empirical research, create and evaluate methods to deal with this specific risk, so that future systems do not have this failure mode.
The future of AI and AI safety has never been brighter.