I’ve never seen anyone address why that is not the case.
It’s solving a different problem.
Problem One: You know exactly what you want your software to do, at a level of detail sufficient to write the software, but you are concerned that you may introduce bugs in the implementation or that it may be fed bad data by a malicious third party, and that in that case terrible consequences will ensue.
Problem Two: You know in a vague handwavy way what you want your software to do, but you yet don’t know with enough precision to write the software. You are concerned that if you get this wrong, the software will do something subtly different from what you really wanted, and terrible consequences will ensue.
Software verification and crypto address Problem One. AI safety is an instance of Problem Two, and potentially an exceptionally difficult one.
Verification seems like a strictly simpler problem. If we can’t prove properties for a web server, how are we going to do anything about a completely unspecified AI?
The AI take over scenarios I’ve head almost always involve some kind of hacking, because today hacking is easy. I don’t see why that would necessarily be the case a decade from now. We could prove some operating system security guarantees for instance.
Yes, verification is a strictly simpler problem, and one that’s fairly thoroughly addressed by existing research—which is why people working specifically on AI safety are paying attention to other things.
(Maybe they should actually be working on doing verification better first, but that doesn’t seem obviously a superior strategy.)
Some AI takeover scenarios involve hacking (by the AI, of other systems). We might hope to make AI safer by making that harder, but that would require securing all the other important computer systems in the world. Even though making an AI safe is really hard, it may well be easier than that.
Yes, verification is a strictly simpler problem, and one that’s fairly thoroughly addressed by existing research—which is why people working specifically on AI safety are paying attention to other things.
This doesn’t really seem true to me. We are currently pretty bad at software verification, only able to deal with either fairly simple properties or fairly simple programs. I also think that people in verification do care about the “specification problem”, which is roughly problem 2 above (although I don’t think anyone really has that many ideas for how to address it).
I mildly disagree. A certain amount of AI safety specifically involves trying to extend our available tools for dealing with Problem One to the situations that we expect to happen when we deal with powerful learning agents. Goal-system stability, for instance, is a matter of program verification—hence why all the papers about it deal with mathematical logic.
I haven’t read any technical papers on goal-system stability; isn’t it the case that real-world attempts at that are going to have at least as much of Problem Two as of Problem One about them? (“Internally”—in the notion of what counts as self-improvement—if not “externally” in whatever problem(s) the system is trying to solve.) I haven’t thought (or read) enough about this for my opinion to have much weight; I could well be completely wrong about it.
Regardless, you’re certainly right that Problem One is going to be important as well as Problem Two, and I should have said something like “AI safety is also an instance of Problem Two”.
isn’t it the case that real-world attempts at that are going to have at least as much of Problem Two as of Problem One about them? (“Internally”—in the notion of what counts as self-improvement—if not “externally” in whatever problem(s) the system is trying to solve.) I haven’t thought (or read) enough about this for my opinion to have much weight; I could well be completely wrong about it.
Kind of. We expect intuitively that a reasoning system can reason about its own goals and successor-agents. Problem is, that actually requires degrees of self-reference that put you into the territory of paradox theorems. So we expect that if we come up with the right way to deal with paradox theorems, the agent’s ability to “stay stable” will fall out pretty naturally.
that actually requires degrees of self-reference that put you into the territory of paradox theorems.
Oh, OK, the Löbstacle thing. You’re right, that’s a matter of program verification and as such more in the territory of Problem One than of Problem Two.
It’s solving a different problem.
Problem One: You know exactly what you want your software to do, at a level of detail sufficient to write the software, but you are concerned that you may introduce bugs in the implementation or that it may be fed bad data by a malicious third party, and that in that case terrible consequences will ensue.
Problem Two: You know in a vague handwavy way what you want your software to do, but you yet don’t know with enough precision to write the software. You are concerned that if you get this wrong, the software will do something subtly different from what you really wanted, and terrible consequences will ensue.
Software verification and crypto address Problem One. AI safety is an instance of Problem Two, and potentially an exceptionally difficult one.
Verification seems like a strictly simpler problem. If we can’t prove properties for a web server, how are we going to do anything about a completely unspecified AI?
The AI take over scenarios I’ve head almost always involve some kind of hacking, because today hacking is easy. I don’t see why that would necessarily be the case a decade from now. We could prove some operating system security guarantees for instance.
Yes, verification is a strictly simpler problem, and one that’s fairly thoroughly addressed by existing research—which is why people working specifically on AI safety are paying attention to other things.
(Maybe they should actually be working on doing verification better first, but that doesn’t seem obviously a superior strategy.)
Some AI takeover scenarios involve hacking (by the AI, of other systems). We might hope to make AI safer by making that harder, but that would require securing all the other important computer systems in the world. Even though making an AI safe is really hard, it may well be easier than that.
This doesn’t really seem true to me. We are currently pretty bad at software verification, only able to deal with either fairly simple properties or fairly simple programs. I also think that people in verification do care about the “specification problem”, which is roughly problem 2 above (although I don’t think anyone really has that many ideas for how to address it).
I would be somewhat more convinced that MIRI was up to it’s mission if they could contribute to much simpler problems in prerequisite fields.
I mildly disagree. A certain amount of AI safety specifically involves trying to extend our available tools for dealing with Problem One to the situations that we expect to happen when we deal with powerful learning agents. Goal-system stability, for instance, is a matter of program verification—hence why all the papers about it deal with mathematical logic.
I haven’t read any technical papers on goal-system stability; isn’t it the case that real-world attempts at that are going to have at least as much of Problem Two as of Problem One about them? (“Internally”—in the notion of what counts as self-improvement—if not “externally” in whatever problem(s) the system is trying to solve.) I haven’t thought (or read) enough about this for my opinion to have much weight; I could well be completely wrong about it.
Regardless, you’re certainly right that Problem One is going to be important as well as Problem Two, and I should have said something like “AI safety is also an instance of Problem Two”.
Kind of. We expect intuitively that a reasoning system can reason about its own goals and successor-agents. Problem is, that actually requires degrees of self-reference that put you into the territory of paradox theorems. So we expect that if we come up with the right way to deal with paradox theorems, the agent’s ability to “stay stable” will fall out pretty naturally.
Oh, OK, the Löbstacle thing. You’re right, that’s a matter of program verification and as such more in the territory of Problem One than of Problem Two.