An oscilloscope. Note that it isn’t particularly realistic to hook up a scope to the kind of hardware that current AI systems are typically trained and run on.
But what I was trying to gesture at with this comment, is this is the kind of problem that the AI might be able to help you with. If the AI isn’t willing or able to fold itself up into something that can be run entirely on a single, human-inspectable CPU in an airgapped box, running code that is amenable to easily proving things about its behavior, you can just not cooperate with it, or not do whatever else you were planning to do by proving something about it, and just shut it off instead.
(If the AI is already adversarial to the point where it won’t let you shut it off, and is running on a distributed system, you’ve already lost. Willingness to fold itself up and be shut off means that the AI is already pretty aligned; it wouldn’t surprise me if this problem is alignment-complete.)
As for your practical difficulties, I agree these are all problems. I am not saying the problem you pose isn’t hard, just that there doesn’t seem to be anything that makes if fundamentally impossible to solve in principle.
There is lots of academic research on hardware security and secure hardware, verifiable computing (e.g. using zk-SNARKs), formally verified programming, chain-of-trust, etc. that attempt to make progress on small pieces of this problem (not necessarily with a specific focus on AI). Stitching all of these things together into an actually end-to-end secure system for interacting with a smarter-than-human AI system is probably possible, but will require solving many unsolved problems, and designing and building AI systems in different ways than we’re currently doing. IMO, it’s probably better to just build an AI that provably shares human values from the start.
I guessed that’s what you meant but was curious whether I was right!
If the AI isn’t willing or able to fold itself up into something that can be run entirely on single, human-inspectable CPU in an airgapped box, running code that is amenable to easily proving things about its behavior, you can just not cooperate with it, or not do whatever else you were planning to do by proving something about it, and just shut it off instead.
Any idea how a ‘folded-up’ AI would imply anything in particular about the ‘expanded’ AI?
If an AI ‘folded itself up’ and provably/probably ‘deleted’ its ‘expanded’ form (and all instances of that), as well as any other AIs or not-AI-agents under its control, that does seem like it would be nearly “alignment-complete” (especially relative to our current AIs), even if, e.g. the AI expected to be able to escape that ‘confinement’.
But that doesn’t seem like it would work as a general procedure for AIs cooperating or even negotiating with each other.
An oscilloscope. Note that it isn’t particularly realistic to hook up a scope to the kind of hardware that current AI systems are typically trained and run on.
But what I was trying to gesture at with this comment, is this is the kind of problem that the AI might be able to help you with. If the AI isn’t willing or able to fold itself up into something that can be run entirely on a single, human-inspectable CPU in an airgapped box, running code that is amenable to easily proving things about its behavior, you can just not cooperate with it, or not do whatever else you were planning to do by proving something about it, and just shut it off instead.
(If the AI is already adversarial to the point where it won’t let you shut it off, and is running on a distributed system, you’ve already lost. Willingness to fold itself up and be shut off means that the AI is already pretty aligned; it wouldn’t surprise me if this problem is alignment-complete.)
As for your practical difficulties, I agree these are all problems. I am not saying the problem you pose isn’t hard, just that there doesn’t seem to be anything that makes if fundamentally impossible to solve in principle.
There is lots of academic research on hardware security and secure hardware, verifiable computing (e.g. using zk-SNARKs), formally verified programming, chain-of-trust, etc. that attempt to make progress on small pieces of this problem (not necessarily with a specific focus on AI). Stitching all of these things together into an actually end-to-end secure system for interacting with a smarter-than-human AI system is probably possible, but will require solving many unsolved problems, and designing and building AI systems in different ways than we’re currently doing. IMO, it’s probably better to just build an AI that provably shares human values from the start.
I guessed that’s what you meant but was curious whether I was right!
Any idea how a ‘folded-up’ AI would imply anything in particular about the ‘expanded’ AI?
If an AI ‘folded itself up’ and provably/probably ‘deleted’ its ‘expanded’ form (and all instances of that), as well as any other AIs or not-AI-agents under its control, that does seem like it would be nearly “alignment-complete” (especially relative to our current AIs), even if, e.g. the AI expected to be able to escape that ‘confinement’.
But that doesn’t seem like it would work as a general procedure for AIs cooperating or even negotiating with each other.