I noticed this but didn’t explicitly point it out. My point was that when paulfchristiano said:
If the AI has a simple goal—press the button—then I think it is materially easier for the AI to modify itself while preserving the button-pressing goal [...] the problem is difficult, but I don’t think it is in the same league as friendliness
He was also assuming that he could handle your objections, e.g. that his AI wouldn’t find a loophole in the definition of “pressing a button”. So the problem he described was not, in fact, simpler than the general problem of FAI.
I don’t think you’ve noticed that this is just moving the fundamental problem to a different place. For example, you haven’t specified things like:
Don’t lie to AI 1 about your actions
Don’t persuade AI 1 to modify itself
Don’t find loopholes in the definition of “AI 1” or “modify”
etc., etc. If you could enforce all these things over superintelligent self-modification, you’d already have solved the general FAI problem.
IOW, what you propose isn’t actually a reduction of anything, AFAICT.
I noticed this but didn’t explicitly point it out. My point was that when paulfchristiano said:
He was also assuming that he could handle your objections, e.g. that his AI wouldn’t find a loophole in the definition of “pressing a button”. So the problem he described was not, in fact, simpler than the general problem of FAI.