I haven’t read any technical papers on goal-system stability; isn’t it the case that real-world attempts at that are going to have at least as much of Problem Two as of Problem One about them? (“Internally”—in the notion of what counts as self-improvement—if not “externally” in whatever problem(s) the system is trying to solve.) I haven’t thought (or read) enough about this for my opinion to have much weight; I could well be completely wrong about it.
Regardless, you’re certainly right that Problem One is going to be important as well as Problem Two, and I should have said something like “AI safety is also an instance of Problem Two”.
isn’t it the case that real-world attempts at that are going to have at least as much of Problem Two as of Problem One about them? (“Internally”—in the notion of what counts as self-improvement—if not “externally” in whatever problem(s) the system is trying to solve.) I haven’t thought (or read) enough about this for my opinion to have much weight; I could well be completely wrong about it.
Kind of. We expect intuitively that a reasoning system can reason about its own goals and successor-agents. Problem is, that actually requires degrees of self-reference that put you into the territory of paradox theorems. So we expect that if we come up with the right way to deal with paradox theorems, the agent’s ability to “stay stable” will fall out pretty naturally.
that actually requires degrees of self-reference that put you into the territory of paradox theorems.
Oh, OK, the Löbstacle thing. You’re right, that’s a matter of program verification and as such more in the territory of Problem One than of Problem Two.
I haven’t read any technical papers on goal-system stability; isn’t it the case that real-world attempts at that are going to have at least as much of Problem Two as of Problem One about them? (“Internally”—in the notion of what counts as self-improvement—if not “externally” in whatever problem(s) the system is trying to solve.) I haven’t thought (or read) enough about this for my opinion to have much weight; I could well be completely wrong about it.
Regardless, you’re certainly right that Problem One is going to be important as well as Problem Two, and I should have said something like “AI safety is also an instance of Problem Two”.
Kind of. We expect intuitively that a reasoning system can reason about its own goals and successor-agents. Problem is, that actually requires degrees of self-reference that put you into the territory of paradox theorems. So we expect that if we come up with the right way to deal with paradox theorems, the agent’s ability to “stay stable” will fall out pretty naturally.
Oh, OK, the Löbstacle thing. You’re right, that’s a matter of program verification and as such more in the territory of Problem One than of Problem Two.