In this post, it seems like ontology identification is acting as a partial solution to the easy goal inference problem, by explaining how the human’s behavior would change if their model of the world changed. I would be more interested in seeing any credible strategy for the easy goal inference problem. I don’t really see how we are going to solve that problem in a way that doesn’t incidentally address ontology identification, and I don’t really see how a solution to ontology identification is going to help with that problem.
This may reflect a more general methodological difference. My impression is that the MIRI view is something like “obviously any solution will have to address X, so X is a reasonable way to approach the big problem,” for X = ontology identification, tiling, “realistic world models,” etc.
My impression is more like “obviously any solution will have to address the ontology identification problem, so it doesn’t matter whether we find some special-case solution to that.”
Both situations come up a lot in computer science, and I do both of them a ton in my own research. I don’t really have a clean story about when one or the other is a more appropriate response, but in these particular cases I have pretty strong intuitions.
Regarding the methodological difference. My perspective is that head-on attempts to solve AI safety are not very promising since we lack the tools to answer basic questions in the general area of the problem such as “what is intelligence?”, “what is the computational resource cost of constructing an agent of given intelligence?” or “what is the growth curve of self-improving agents?” Therefore, what we should be doing is constructing a general theory capable of answering questions of this type (I would call it abstract intelligence theory). Thinking about problems such naturalized induction and Vingean reflection seems to me a useful way to approach this, not because they are subproblems of the AI safety problem but because they are handles to getting a mathematical grip on the entire area.
Good point. In this post I was trying to solve something like the easy goal inference problem by factoring it into some different problems, including ontology identification, but it’s not clear whether this is the right factoring.
It seems like your intuition is something like: “any way of correctly modelling human mistakes when the human and AI share an ontology will also correctly handle mistakes arising from the human and AI having a different ontology”. I think I mostly agree with this intuition. My motivation for working on ontology identification despite this intuition is some combination of (a) “easy” versions of ontology identification seem useful outside the domain of value learning (e.g. making a genie for concrete physical tasks), and (b) I don’t see many promising approaches for directly attacking the easy goal inference problem.
But after writing this, I think I have updated towards looking for approaches to the easy goal inference problem that avoid ontology identification. The most promising thing I can think of right now seems to be some variation on planning algorithm 2, but with an adjustment so that the planning can take into account the AI’s different predictions (but not the AI’s internal representation). It does seem plausible that something in this space would work without directly solving the ontology identification problem.
I don’t see many promising approaches for directly attacking the easy goal inference problem.
I would agree with that. But I don’t see how this situation will change no matter what you learn about ontology identification. It looks to me like the easy goal inference problem is probably just ill-posed/incoherent, and we should avoid any approach that rests on us solving it. The kind of insight that would be required to change this view looks very unlike the kind required to solve ontology identification, and also unlike the kind required to make conventional progress in AI, so to the extent that there are workable approaches to the easy goal inference problem it seems like we could work towards them now. If we can’t see how to attack the problem, then by the same token that leads me to be pessimistic.
On that perspective we might ask: how we are avoiding the problem? The dodges I know would also dodge ontology identification, by cashing everything out in terms of human behavior. It’s harder for me to know what the situation is like for solutions to the goal inference problem—because I don’t yet see any plausible solution strategies—but I would guess that the situation will turn out to be similar.
In this post, it seems like ontology identification is acting as a partial solution to the easy goal inference problem, by explaining how the human’s behavior would change if their model of the world changed. I would be more interested in seeing any credible strategy for the easy goal inference problem. I don’t really see how we are going to solve that problem in a way that doesn’t incidentally address ontology identification, and I don’t really see how a solution to ontology identification is going to help with that problem.
This may reflect a more general methodological difference. My impression is that the MIRI view is something like “obviously any solution will have to address X, so X is a reasonable way to approach the big problem,” for X = ontology identification, tiling, “realistic world models,” etc.
My impression is more like “obviously any solution will have to address the ontology identification problem, so it doesn’t matter whether we find some special-case solution to that.”
Both situations come up a lot in computer science, and I do both of them a ton in my own research. I don’t really have a clean story about when one or the other is a more appropriate response, but in these particular cases I have pretty strong intuitions.
Regarding the methodological difference. My perspective is that head-on attempts to solve AI safety are not very promising since we lack the tools to answer basic questions in the general area of the problem such as “what is intelligence?”, “what is the computational resource cost of constructing an agent of given intelligence?” or “what is the growth curve of self-improving agents?” Therefore, what we should be doing is constructing a general theory capable of answering questions of this type (I would call it abstract intelligence theory). Thinking about problems such naturalized induction and Vingean reflection seems to me a useful way to approach this, not because they are subproblems of the AI safety problem but because they are handles to getting a mathematical grip on the entire area.
Good point. In this post I was trying to solve something like the easy goal inference problem by factoring it into some different problems, including ontology identification, but it’s not clear whether this is the right factoring.
It seems like your intuition is something like: “any way of correctly modelling human mistakes when the human and AI share an ontology will also correctly handle mistakes arising from the human and AI having a different ontology”. I think I mostly agree with this intuition. My motivation for working on ontology identification despite this intuition is some combination of (a) “easy” versions of ontology identification seem useful outside the domain of value learning (e.g. making a genie for concrete physical tasks), and (b) I don’t see many promising approaches for directly attacking the easy goal inference problem.
But after writing this, I think I have updated towards looking for approaches to the easy goal inference problem that avoid ontology identification. The most promising thing I can think of right now seems to be some variation on planning algorithm 2, but with an adjustment so that the planning can take into account the AI’s different predictions (but not the AI’s internal representation). It does seem plausible that something in this space would work without directly solving the ontology identification problem.
I would agree with that. But I don’t see how this situation will change no matter what you learn about ontology identification. It looks to me like the easy goal inference problem is probably just ill-posed/incoherent, and we should avoid any approach that rests on us solving it. The kind of insight that would be required to change this view looks very unlike the kind required to solve ontology identification, and also unlike the kind required to make conventional progress in AI, so to the extent that there are workable approaches to the easy goal inference problem it seems like we could work towards them now. If we can’t see how to attack the problem, then by the same token that leads me to be pessimistic.
On that perspective we might ask: how we are avoiding the problem? The dodges I know would also dodge ontology identification, by cashing everything out in terms of human behavior. It’s harder for me to know what the situation is like for solutions to the goal inference problem—because I don’t yet see any plausible solution strategies—but I would guess that the situation will turn out to be similar.