That’s why I used the “(idealised) agent” description (but titles need to be punchier).
Though I think “simple” goal is incorrect. The goal can be extremely complex—much more complex that human preferences. There’s no limit to the subtleties you can pack into a utility function. There is a utility function that will fit perfectly to every decision you make in your entire life, for example.
The reason to look for an idealised agent, though, is that a utility function is stable in a way that humans are not. If there is some stable utility function that encompasses human preferences (it might be something like “this is the range of human preferences” or similar) then, if given to an AI, the AI will not seek to transform humans into something else in order to satisfy our “preferences”.
The AI has to be something of an agent, so it’s model of human preferences has to be an agent-ish model.
“There is a utility function that will fit perfectly to every decision you make in your entire life, for example.”
Sure, but I don’t care about that. If two years from now a random glitch causes me to do something a bit different, which means that my full set of actions matches some slightly different utility function, I will not care at all.
Is that really the standard definition of agent though? Most textbooks I’ve seen talk of agents working towards the achievement of a goal, but it says nothing about the permanence of that goal system. I would expect an “idealized agent” to always take actions that maximize likelihood of achieving its goals, but that is orthogonal from whether the system of goals changes.
Then take my definition of agent in this post as “expected utility maximiser with a clear and distinct utility that is, in practice, Cartesianianly separated from the rest of the universe”, and I’ll try and be clearer in subsequent posts.
That’s why I used the “(idealised) agent” description (but titles need to be punchier).
Though I think “simple” goal is incorrect. The goal can be extremely complex—much more complex that human preferences. There’s no limit to the subtleties you can pack into a utility function. There is a utility function that will fit perfectly to every decision you make in your entire life, for example.
The reason to look for an idealised agent, though, is that a utility function is stable in a way that humans are not. If there is some stable utility function that encompasses human preferences (it might be something like “this is the range of human preferences” or similar) then, if given to an AI, the AI will not seek to transform humans into something else in order to satisfy our “preferences”.
The AI has to be something of an agent, so it’s model of human preferences has to be an agent-ish model.
“There is a utility function that will fit perfectly to every decision you make in your entire life, for example.”
Sure, but I don’t care about that. If two years from now a random glitch causes me to do something a bit different, which means that my full set of actions matches some slightly different utility function, I will not care at all.
Is that really the standard definition of agent though? Most textbooks I’ve seen talk of agents working towards the achievement of a goal, but it says nothing about the permanence of that goal system. I would expect an “idealized agent” to always take actions that maximize likelihood of achieving its goals, but that is orthogonal from whether the system of goals changes.
Then take my definition of agent in this post as “expected utility maximiser with a clear and distinct utility that is, in practice, Cartesianianly separated from the rest of the universe”, and I’ll try and be clearer in subsequent posts.