For an AI to exploit safety plans the AI would need to have a goal to be unsafe. Most of the safety plans we have are about avoiding AI from developing such goals.
It might very well be helpful if the AI wants to be aligned if the AI knows about a bunch of different plans to make aligned AI.
Threat modeling is important when doing any security and I would expect that disagreeing with your threat model is the main reason your post wasn’t better received the last time. The information from the interaction with ChatGPT doesn’t address any cruxes.
For an AI to exploit safety plans the AI would need to have a goal to be unsafe. Most of the safety plans we have are about avoiding AI from developing such goals.
It might very well be helpful if the AI wants to be aligned if the AI knows about a bunch of different plans to make aligned AI.
Threat modeling is important when doing any security and I would expect that disagreeing with your threat model is the main reason your post wasn’t better received the last time. The information from the interaction with ChatGPT doesn’t address any cruxes.