The basic problem is the endemic confusion between the map, the UF as a way of modelling an entity, and the territory. the UF as an architectural feature that makes certain things happen.
The fact that there are multiple ways of modelling humans as UF-driven, and the fact that they are all a bit contrived, should be a hint that there may be no territory corresponding to the map.
Is there an article that presents multiple models of UF-driven humans and demonstrates that what you criticize as contrived actually shows there is no territory to correspond to the map? Right now your statement doesn’t have enough detail for me to be convinced that UF-driven humans are a bad model.
And you didn’t answer my question: is there another way, besides UFs, to guide an agent towards a goal? It seems to me that the idea of moving toward a goal implies a utility function, be it hunger or human programmed.
Is there an article that presents multiple models of UF-driven humans and demonstrates that what you criticize as contrived actually shows there is no territory to correspond to the map?
Rather than trying to prove the negative, it is more a question of whether these models are known to be useful.
The idea of mulitple or changing UFs suffers from a problem falsifiability, as well. Whenever a human changes their apparent goals, that’s a switch to another UF, or a change in UF? Reminiscent of ptolemaic epicycles, as Ben Goerzel says.
And you didn’t answer my question: is there another way, besides UFs, to guide an agent towards a goal? It seems to me that the idea of moving toward a goal implies a utility function, be it hunger or human programmed.
Implies what kind of UF?
If you are arguing tautologously that having a UF just is having goal directed behaviour, then you are not going to be able to draw interesting conclusions. If you are going to define “having a UF broadly, then you are going to have similar problems, and in particular the problem that “the problem of making an AI safe simplifies to the problem of making its UF safe” only works for certain, relatively narrow, definitions of UF. In the context of a biological organism, or an artificial neural net or deep learning AI, the only thing “UF” could mean is some aspect of its functioning that is entangled with all the others. Neither a biological organism, nor an artificial neural net or deep learning AI is going to have a UF that can be conveniently separated out and reprogrammed. That definition of UF only belongs in the context of GOFAI or symbolic programming.
There is no point in defining a term broadly to make one claim come out true, if it was is only an intermediate step towards some other claim, which doesn’t come out as true under the broad definition.
My definition of utility function is one commonly used in AI. It is a mapping of states to a real number: u:E → R
where u is a state in E (the set of all possible states), and R is the reals in one dimension.
What definition are you using? I don’t think we can have a productive conversation until we both understand each other’s definitions.
I’m not using a definition, I’m pointing out that standard arguments about UFs depend on ambiguities.
Your definition is abstract and doens’t capture anything that an actual AI could “have”—for one thing, you can’t compute the reals. It also fails to capture what UF’s are “for”.
AI researchers, a group of people who are fairly disjoint from LessWrongians, may have a rigorous and stable definition of UF, but that is not relevant. the point is that writings on MIRI and LessWrong use,and in fact depend on, shifting an ambiguous definitions.
The basic problem is the endemic confusion between the map, the UF as a way of modelling an entity, and the territory. the UF as an architectural feature that makes certain things happen.
The fact that there are multiple ways of modelling humans as UF-driven, and the fact that they are all a bit contrived, should be a hint that there may be no territory corresponding to the map.
Is there an article that presents multiple models of UF-driven humans and demonstrates that what you criticize as contrived actually shows there is no territory to correspond to the map? Right now your statement doesn’t have enough detail for me to be convinced that UF-driven humans are a bad model.
And you didn’t answer my question: is there another way, besides UFs, to guide an agent towards a goal? It seems to me that the idea of moving toward a goal implies a utility function, be it hunger or human programmed.
Rather than trying to prove the negative, it is more a question of whether these models are known to be useful.
The idea of mulitple or changing UFs suffers from a problem falsifiability, as well. Whenever a human changes their apparent goals, that’s a switch to another UF, or a change in UF? Reminiscent of ptolemaic epicycles, as Ben Goerzel says.
Implies what kind of UF?
If you are arguing tautologously that having a UF just is having goal directed behaviour, then you are not going to be able to draw interesting conclusions. If you are going to define “having a UF broadly, then you are going to have similar problems, and in particular the problem that “the problem of making an AI safe simplifies to the problem of making its UF safe” only works for certain, relatively narrow, definitions of UF. In the context of a biological organism, or an artificial neural net or deep learning AI, the only thing “UF” could mean is some aspect of its functioning that is entangled with all the others. Neither a biological organism, nor an artificial neural net or deep learning AI is going to have a UF that can be conveniently separated out and reprogrammed. That definition of UF only belongs in the context of GOFAI or symbolic programming.
There is no point in defining a term broadly to make one claim come out true, if it was is only an intermediate step towards some other claim, which doesn’t come out as true under the broad definition.
My definition of utility function is one commonly used in AI. It is a mapping of states to a real number: u:E → R where u is a state in E (the set of all possible states), and R is the reals in one dimension.
What definition are you using? I don’t think we can have a productive conversation until we both understand each other’s definitions.
I’m not using a definition, I’m pointing out that standard arguments about UFs depend on ambiguities.
Your definition is abstract and doens’t capture anything that an actual AI could “have”—for one thing, you can’t compute the reals. It also fails to capture what UF’s are “for”.
Go read a textbook on AI. You clearly do not understand utility functions.
AI researchers, a group of people who are fairly disjoint from LessWrongians, may have a rigorous and stable definition of UF, but that is not relevant. the point is that writings on MIRI and LessWrong use,and in fact depend on, shifting an ambiguous definitions.