I think the strongest takeaway from this post is that, now that I think about it, is that alignment is not equal to safety, and that even if AI is controllable, it may not be totally safe to someone else.
In your Fusion Power Generator scenario, what happened is that they asked for a fusion power generator, and the AI managed to make the fusion power generator, and it was inner and outer aligned enough to the principal such that it didn’t take over the world and make every fusion power plant, and in particular hasn’t Goodharted the specification negatively.
In essence, this seems like a standard misuse of AI (though I don’t exactly like the connotations), and thus if I were to make this post, I’d focus on how aligning AI isn’t enough to assure safety, or putting it another way, there are more problems in AI safety than just alignment/control problems.
I think the strongest takeaway from this post is that, now that I think about it, is that alignment is not equal to safety, and that even if AI is controllable, it may not be totally safe to someone else.
In your Fusion Power Generator scenario, what happened is that they asked for a fusion power generator, and the AI managed to make the fusion power generator, and it was inner and outer aligned enough to the principal such that it didn’t take over the world and make every fusion power plant, and in particular hasn’t Goodharted the specification negatively.
In essence, this seems like a standard misuse of AI (though I don’t exactly like the connotations), and thus if I were to make this post, I’d focus on how aligning AI isn’t enough to assure safety, or putting it another way, there are more problems in AI safety than just alignment/control problems.