Orseau and Ring, as well as Dewey, have recently described problems, including self-delusion,
with the behavior of agents using various definitions of utility functions. An agent’s utility
function is defined in terms of the agent’s history of interactions with its environment. This paper
argues, via two examples, that the behavior problems can be avoided by formulating the utility
function in two steps: 1) inferring a model of the environment from interactions, and 2) computing
utility as a function of the environment model. Basing a utility function on a model that the agent
must learn implies that the utility function must initially be expressed in terms of specifications to
be matched to structures in the learned model. These specifications constitute prior assumptions
about the environment so this approach will not work with arbitrary environments. But the
approach should work for agents designed by humans to act in the physical world. The paper also
addresses the issue of self-modifying agents and shows that if provided with the possibility to
modify their utility functions agents will not choose to do so, under some usual assumptions
Also, regarding this part of your post:
For example: moving yourself in space (in a certain speed range)
This range is quite huge. In certain contexts, you’d want to be moving through space at high fractions of the speed of light, rather than walking speed. Same goes for moving other objects through space. Btw, would you count a data packet as an object you move through space?
staying in a single spot (for a certain time range)
Hopefully the AI knows you mean moving in sync with Earth’s movement through space.
Thank you for actually engaging with the idea (pointing out problems and whatnot) rather than just suggesting reading material.
Btw, would you count a data packet as an object you move through space?
A couple of points:
I only assume AI models the world as “objects” moving through space and time, without restricting what those objects could be. So yes, a data packet might count.
“Fundamental variables” don’t have to capture all typical effects of humans on the world, they only need to capture typical human actions which humans themselves can easily perceive and comprehend. So the fact that a human can send an Internet message at 2⁄3 speed of light doesn’t mean that “2/3 speed of light” should be included in the range of fundamental variables, since humans can’t move and react at such speeds.
Conclusion: data packets can be seen as objects, but there are many other objects which are much easier for humans to interact with.
Also note that fundamental variables are not meant to be some kind of “moral speed limits”, prohibiting humans or AIs from acting at certain speeds. Fundamental variables are only needed to figure out what physical things humans can most easily interact with (because those are the objects humans are most likely to care about).
This range is quite huge. In certain contexts, you’d want to be moving through space at high fractions of the speed of light, rather than walking speed. Same goes for moving other objects through space.
What contexts do you mean? Maybe my point about “moral speed limits” addresses this.
Hopefully the AI knows you mean moving in sync with Earth’s movement through space.
Yes, relativity of motion is a problem which needs to be analyzed. Fundamental variables should refer to relative speeds/displacements or something.
For example a person may be specified by textual name and address, by textual physical description, and by images and other recordings. There is very active research on recognizing people and objects by such specifications (Bishop, 2006; Koutroumbas and Theodoris, 2008; Russell and Norvig, 2010). This paper will not discuss the details of how specifications can be matched to structures in learned environment models, but assumes that algorithms for doing this are included in the utility function implementation.
Does it just completely ignore the main problem?
I know Abram Demski wrote about Model-based Utility Functions, but I couldn’t fully understand his post too.
(Disclaimer: I’m almost mathematically illiterate, except knowing a lot of mathematical concepts from popular materials. Halting problem, Godel, uncountability, ordinals vs. cardinals, etc.)
Also note that fundamental variables are not meant to be some kind of “moral speed limits”, prohibiting humans or AIs from acting at certain speeds. Fundamental variables are only needed to figure out what physical things humans can most easily interact with (because those are the objects humans are most likely to care about).
Ok, that clears things up a lot. However, I still worry that if it’s at the AI’s discretion when and where to sidestep the fundamental variables, we’re back at the regular alignment problem. You have to be reasonably certain what the AI is going to do in extremely out of distribution scenarios.
The subproblem of environmental goals is just to make AI care about natural enough (from the human perspective) “causes” of sensory data, not to align AI to the entirety of human values. Fundamental variables have no (direct) relation to the latter problem.
However, fundamental variables would be helpful for defining impact measures if we had a principled way to differentiate “times when it’s OK to sidestep fundamental variables” from “times when it’s NOT OK to sidestep fundamental variables”. That’s where the things you’re talking about definitely become a problem. Or maybe I’m confused about your point.
You may be interested in this article:
Model-Based Utility Functions
Also, regarding this part of your post:
This range is quite huge. In certain contexts, you’d want to be moving through space at high fractions of the speed of light, rather than walking speed. Same goes for moving other objects through space. Btw, would you count a data packet as an object you move through space?
Hopefully the AI knows you mean moving in sync with Earth’s movement through space.
Thank you for actually engaging with the idea (pointing out problems and whatnot) rather than just suggesting reading material.
A couple of points:
I only assume AI models the world as “objects” moving through space and time, without restricting what those objects could be. So yes, a data packet might count.
“Fundamental variables” don’t have to capture all typical effects of humans on the world, they only need to capture typical human actions which humans themselves can easily perceive and comprehend. So the fact that a human can send an Internet message at 2⁄3 speed of light doesn’t mean that “2/3 speed of light” should be included in the range of fundamental variables, since humans can’t move and react at such speeds.
Conclusion: data packets can be seen as objects, but there are many other objects which are much easier for humans to interact with.
Also note that fundamental variables are not meant to be some kind of “moral speed limits”, prohibiting humans or AIs from acting at certain speeds. Fundamental variables are only needed to figure out what physical things humans can most easily interact with (because those are the objects humans are most likely to care about).
What contexts do you mean? Maybe my point about “moral speed limits” addresses this.
Yes, relativity of motion is a problem which needs to be analyzed. Fundamental variables should refer to relative speeds/displacements or something.
The paper is surely at least partially relevant, but what’s your own opinion on it? I’m confused about this part: (4.2 Defining Utility Functions in Terms of Learned Models)
Does it just completely ignore the main problem?
I know Abram Demski wrote about Model-based Utility Functions, but I couldn’t fully understand his post too.
(Disclaimer: I’m almost mathematically illiterate, except knowing a lot of mathematical concepts from popular materials. Halting problem, Godel, uncountability, ordinals vs. cardinals, etc.)
Ok, that clears things up a lot. However, I still worry that if it’s at the AI’s discretion when and where to sidestep the fundamental variables, we’re back at the regular alignment problem. You have to be reasonably certain what the AI is going to do in extremely out of distribution scenarios.
The subproblem of environmental goals is just to make AI care about natural enough (from the human perspective) “causes” of sensory data, not to align AI to the entirety of human values. Fundamental variables have no (direct) relation to the latter problem.
However, fundamental variables would be helpful for defining impact measures if we had a principled way to differentiate “times when it’s OK to sidestep fundamental variables” from “times when it’s NOT OK to sidestep fundamental variables”. That’s where the things you’re talking about definitely become a problem. Or maybe I’m confused about your point.
Thanks. That makes sense.