I guessed that’s what you meant but was curious whether I was right!
If the AI isn’t willing or able to fold itself up into something that can be run entirely on single, human-inspectable CPU in an airgapped box, running code that is amenable to easily proving things about its behavior, you can just not cooperate with it, or not do whatever else you were planning to do by proving something about it, and just shut it off instead.
Any idea how a ‘folded-up’ AI would imply anything in particular about the ‘expanded’ AI?
If an AI ‘folded itself up’ and provably/probably ‘deleted’ its ‘expanded’ form (and all instances of that), as well as any other AIs or not-AI-agents under its control, that does seem like it would be nearly “alignment-complete” (especially relative to our current AIs), even if, e.g. the AI expected to be able to escape that ‘confinement’.
But that doesn’t seem like it would work as a general procedure for AIs cooperating or even negotiating with each other.
I guessed that’s what you meant but was curious whether I was right!
Any idea how a ‘folded-up’ AI would imply anything in particular about the ‘expanded’ AI?
If an AI ‘folded itself up’ and provably/probably ‘deleted’ its ‘expanded’ form (and all instances of that), as well as any other AIs or not-AI-agents under its control, that does seem like it would be nearly “alignment-complete” (especially relative to our current AIs), even if, e.g. the AI expected to be able to escape that ‘confinement’.
But that doesn’t seem like it would work as a general procedure for AIs cooperating or even negotiating with each other.