RicG comments on AGI-Automated Interpretability is Suicide

__RicG__ 10 May 2023 21:18 UTC
1 point
0
I hope we can prevent the AGI to just train a twin (or just copy itself and call that a twin) and study that. In my scenario I took as a given that we do have the AGI under some level control:
If no alignment scheme is in place, this type of foom is probably a problem we would be too dead to worry about.
I guess when I say “No lab should be allowed to have the AI reflect on itself” I do not mean only the running copy of the AGI, but just at any copy of the AGI.