Apart from the potential to speed up foom, there is also a more prosaic reason why interpretability by other AIs or humans could be dangerous: interpretability could reveal infohazardous reasoning en route of inferring aligned, ethical plans: https://www.lesswrong.com/posts/CRrkKAafopCmhJEBt/ai-interpretability-could-be-harmful. So I suggested that we may need to go as far as to cryptographically obfuscating AI reasoning process that leads to “aligned” plans.
Apart from the potential to speed up foom, there is also a more prosaic reason why interpretability by other AIs or humans could be dangerous: interpretability could reveal infohazardous reasoning en route of inferring aligned, ethical plans: https://www.lesswrong.com/posts/CRrkKAafopCmhJEBt/ai-interpretability-could-be-harmful. So I suggested that we may need to go as far as to cryptographically obfuscating AI reasoning process that leads to “aligned” plans.