What I mostly referred to in my comment was the ontology problem for agents with high-reductive-level motivations. Example: a robot built to make people happy has to be able to find happiness somewhere in their world model, but a robot built to make itself smarter has no such need. So if you want a robot to make people happy, using world-models built to make a robot smarter, the happiness maximizer is going to need to be able to find happiness inside an unfamiliar ontology.
More exposition about why world models will end up different:
Recently I’ve been trying to think about why building inherently lossy predictive models of the future is a good idea. My current thesis statement is that since computations of models are much more valuable finished than unfinished, it’s okay to have a lossy model as long as it finishes. The trouble is quantifying this.
For the current purpose, though, the details are not so important. Supposing one understands the uncertainty characteristics of various models, one chooses a model by maximizing an effective expected value, because inaccurate predictive models have some associated cost that depends on the agent’s preferences. Agents with different preferences will pick different methods of predicting the future, even if they’re locked into the same ontology, and so anything not locked in is fair game to vary widely.
the happiness maximizer is going to need to be able to find happiness inside an unfamiliar ontology.
But the module for predicting human behaviour/preferences should surely be the same in a different ontology? The module is a model, and the model is likely not grounded in the fine detail of the ontology.
Example: the law of comparative advantage in economics is a high level model, which won’t collapse because the fundamental ontology is relativity rather than newtonian mechanics. Even in a different ontology, humans should remain (by far) the best things in the world that approximate the “human model”.
If there is a module that specifically requires prediction of human behavior, sure. My claim in the second part of my comment is that if the model predicts the number of paperclips, it’s not necessary that the closest match to things that function like human decisions will actually be a useful predictive model of human decisions.
What I mostly referred to in my comment was the ontology problem for agents with high-reductive-level motivations. Example: a robot built to make people happy has to be able to find happiness somewhere in their world model, but a robot built to make itself smarter has no such need. So if you want a robot to make people happy, using world-models built to make a robot smarter, the happiness maximizer is going to need to be able to find happiness inside an unfamiliar ontology.
More exposition about why world models will end up different:
Recently I’ve been trying to think about why building inherently lossy predictive models of the future is a good idea. My current thesis statement is that since computations of models are much more valuable finished than unfinished, it’s okay to have a lossy model as long as it finishes. The trouble is quantifying this.
For the current purpose, though, the details are not so important. Supposing one understands the uncertainty characteristics of various models, one chooses a model by maximizing an effective expected value, because inaccurate predictive models have some associated cost that depends on the agent’s preferences. Agents with different preferences will pick different methods of predicting the future, even if they’re locked into the same ontology, and so anything not locked in is fair game to vary widely.
But the module for predicting human behaviour/preferences should surely be the same in a different ontology? The module is a model, and the model is likely not grounded in the fine detail of the ontology.
Example: the law of comparative advantage in economics is a high level model, which won’t collapse because the fundamental ontology is relativity rather than newtonian mechanics. Even in a different ontology, humans should remain (by far) the best things in the world that approximate the “human model”.
If there is a module that specifically requires prediction of human behavior, sure. My claim in the second part of my comment is that if the model predicts the number of paperclips, it’s not necessary that the closest match to things that function like human decisions will actually be a useful predictive model of human decisions.