This is a knowledge doubt: reading your description of a value learning system, it feels like what it has above and beyond the reinforcement learner is a model not only of the other being, but of its goals.
In Dennett parlance, it has two levels of intentionality: I think that you want that the toy be built.
In psychology parlance, it has somewhat sophisticated theory of mind.
In philosophical terms it distinguishes intensions from extensions.
Are these correct inferences from being a value learner?
The kids in this video, are Value Learners or Reinforcement Learners? What about the Chimps?
What Dan Dewey describes as an optimal value learner is not what either kids or chimps do: Replacing the reinforcement learner’s sum of rewards with an expected utility over a pool of possible utility functions, we have an optimality notion for a value learning agent.
Since when we infer goals from others, we are not expectimaxing over possible goals the agent could have. we are simply maxing. The kids assume only the goal with highest likelihood.
That’s probably the correct inference, if I understand you. The value learner has priors over what the world is like, and further priors over what is valuable.
The kids and the chimps both already have values, and are trying to learn how to fulfil them.
The kids and chimps have different priors. Kids assume the experimenter has reasons to be doing the weird non-seemingly goal oriented things he does. Humans alone can entertain fictions. This makes us powerful but also more prone to supersticious behavior (in behaviorist terminology).
If you were expectimaxing over what an agent would do (which is what Dewey suggests a value learner does) you’d end up with behaviors that are seldom useful, because some parts of your behavior would further one goal, and some others, you would not commit to all the behaviors that further the one goal you assign more likelihood to be valuable.
Maxing would be find the highest value, ignore all others, expectimaxing would be a mixed hybrid which fails when all or none is relevant.
No doubt this is not my most eloquent thread in history. Sorry, give up on this if you don’t understand it.
This is a knowledge doubt: reading your description of a value learning system, it feels like what it has above and beyond the reinforcement learner is a model not only of the other being, but of its goals.
In Dennett parlance, it has two levels of intentionality: I think that you want that the toy be built.
In psychology parlance, it has somewhat sophisticated theory of mind.
In philosophical terms it distinguishes intensions from extensions.
Are these correct inferences from being a value learner?
The kids in this video, are Value Learners or Reinforcement Learners? What about the Chimps?
https://www.youtube.com/watch?v=6zSut-U1Iks
What Dan Dewey describes as an optimal value learner is not what either kids or chimps do: Replacing the reinforcement learner’s sum of rewards with an expected utility over a pool of possible utility functions, we have an optimality notion for a value learning agent.
Since when we infer goals from others, we are not expectimaxing over possible goals the agent could have. we are simply maxing. The kids assume only the goal with highest likelihood.
That’s probably the correct inference, if I understand you. The value learner has priors over what the world is like, and further priors over what is valuable.
The kids and the chimps both already have values, and are trying to learn how to fulfil them.
I don’t follow your other points, sorry.
The kids and chimps have different priors. Kids assume the experimenter has reasons to be doing the weird non-seemingly goal oriented things he does. Humans alone can entertain fictions. This makes us powerful but also more prone to supersticious behavior (in behaviorist terminology).
If you were expectimaxing over what an agent would do (which is what Dewey suggests a value learner does) you’d end up with behaviors that are seldom useful, because some parts of your behavior would further one goal, and some others, you would not commit to all the behaviors that further the one goal you assign more likelihood to be valuable. Maxing would be find the highest value, ignore all others, expectimaxing would be a mixed hybrid which fails when all or none is relevant.
No doubt this is not my most eloquent thread in history. Sorry, give up on this if you don’t understand it.