The basic problem is the endemic confusion between the map, the UF as a way of modelling an entity, and the territory. the UF as an architectural feature that makes certain things happen.
It seems to you that entities with simple and obvious goal directed behaviour (as seen from the outside) have or need UFs, and entities that don’t. don’t. But there isn’t a fixed connection between the way things seem from the outside, and the way they work.
From the outside, any system that succeeds in doing anything specialised can be thought of, or described as a relatively general purpose
system that has been constrained down to a more narrow goal by some other system. For instance, a chess -playing system maybe described as a general purpose problem-solver that has been trained on chess. To say its UF defines a goal of winning at chess is the “map” view.
However, it might well be .. in terms of the territory … in terms of what is going on inside the black box.. a special purpose system that has been specifically coded for chess, has no ability to do anything else, and therefore does not any kind of reward channel or training system to keep it focused on chess. So the mere fact that a system, considered from the outside as a black box, does some specific thing, is not proof that it has a UF, and therefore not a proof that anyone has succeeded in loading values or goals into its UF.
Taking an outside view of a system as possessing a UF (in the spirit or Dennett’s “intentional stance”) will only give correct predictions if everything works correctly. The essential point is that you need a fully accurate picture of what is going in inside a black box in order to predict its behaviour under all circumstances .. but pictures that are inaccurate in various ways can be good enough for restricted sets of circumstances.
Here’s an analogy: suppose that machinery, including domestic appliances, were made of an infinitely malleable substance called Utlronium, say, and were constrained into some particular form, such a a kettle or toaster, by a further gadget called a Veeblefetzer. So long as a kettle functions as a kettle, I can regard it as Ultronium+Veeblefetzer ensemble. However, such ensembles support different counterfactuals to real kettles. For instance. if the veeblefetzer on my kettle fritzes it could suddenly reconfigure it into something else, a toaster or a spice rack—but that is not possible for an ordinary kettle that is not made of Ultronium.
The converse case to an entity seeming to have a UF just because it fulfils some apparent purpose is an entity that seems not to have a UF because its behaviour is complex and perhaps seemingly random. A UF in the territory sense does not have to be simple, and a complex UF can
include higher level goals, such as “seek variety” or “revise your lower level goals from time to time”, so the lack of an obvious UF as judged externally does not imply the lack
of a UF in the gold-standard sense of an actual component.
The actual possession of a UF is much more relevant to AI safety than being describable in terms
of a UF. If an AI doesn’t actually have a UF, you can’t render it safe by fixing its UF.
The basic problem is the endemic confusion between the map, the UF as a way of modelling an entity, and the territory. the UF as an architectural feature that makes certain things happen.
It seems to you that entities with simple and obvious goal directed behaviour (as seen from the outside) have or need UFs, and entities that don’t. don’t. But there isn’t a fixed connection between the way things seem from the outside, and the way they work.
From the outside, any system that succeeds in doing anything specialised can be thought of, or described as a relatively general purpose system that has been constrained down to a more narrow goal by some other system. For instance, a chess -playing system maybe described as a general purpose problem-solver that has been trained on chess. To say its UF defines a goal of winning at chess is the “map” view.
However, it might well be .. in terms of the territory … in terms of what is going on inside the black box.. a special purpose system that has been specifically coded for chess, has no ability to do anything else, and therefore does not any kind of reward channel or training system to keep it focused on chess. So the mere fact that a system, considered from the outside as a black box, does some specific thing, is not proof that it has a UF, and therefore not a proof that anyone has succeeded in loading values or goals into its UF.
Taking an outside view of a system as possessing a UF (in the spirit or Dennett’s “intentional stance”) will only give correct predictions if everything works correctly. The essential point is that you need a fully accurate picture of what is going in inside a black box in order to predict its behaviour under all circumstances .. but pictures that are inaccurate in various ways can be good enough for restricted sets of circumstances.
Here’s an analogy: suppose that machinery, including domestic appliances, were made of an infinitely malleable substance called Utlronium, say, and were constrained into some particular form, such a a kettle or toaster, by a further gadget called a Veeblefetzer. So long as a kettle functions as a kettle, I can regard it as Ultronium+Veeblefetzer ensemble. However, such ensembles support different counterfactuals to real kettles. For instance. if the veeblefetzer on my kettle fritzes it could suddenly reconfigure it into something else, a toaster or a spice rack—but that is not possible for an ordinary kettle that is not made of Ultronium.
The converse case to an entity seeming to have a UF just because it fulfils some apparent purpose is an entity that seems not to have a UF because its behaviour is complex and perhaps seemingly random. A UF in the territory sense does not have to be simple, and a complex UF can include higher level goals, such as “seek variety” or “revise your lower level goals from time to time”, so the lack of an obvious UF as judged externally does not imply the lack of a UF in the gold-standard sense of an actual component.
The actual possession of a UF is much more relevant to AI safety than being describable in terms of a UF. If an AI doesn’t actually have a UF, you can’t render it safe by fixing its UF.