It was not my intention to imply that semantic structure is never needed—I was just saying that the pedestrian example does not indicate the need for semantic structure. I would generally like to minimize the use of semantic structure in impact measures, but I agree it’s unlikely we can get away without it.
There are some kinds of semantic structure that the agent can learn without explicit human input, e.g. by observing how humans have arranged the world (as in the RLSP paper). I think it’s plausible that agents can learn the semantic structure that’s needed for impact measures through unsupervised learning about the world, without relying on human input. This information could be incorporated in the weights assigned to reaching different states or satisfying different utility functions by the deviation measure (e.g. states where pigeons / cats are alive).
Thanks for the clarification, I think our intuitions about how far you could take these techniques may be more similar than was apparent from the earlier comments.
You bring up the distinction between semantic structure that is learned via unsupervised learning, and semantic structure that comes from ‘explicit human input’. We may be using the term ‘semantic structure’ in somewhat different ways when it comes to the question of how much semantic structure you are actually creating in certain setups.
If you set up things to create an impact metric via unsupervised learning, you still need to encode some kind of impact metric on the world state by hand, to go into the agents’s reward function, e.g. you may encode ‘bad impact’ as the observable signal ‘the owner of the agent presses the do-not-like feedback button’. For me, that setup uses a form of indirection to create an impact metric that is incredibly rich in semantic structure. It is incredibly rich because it indirectly incorporates the impact-related semantic structure knowledge that is in the owner’s brain. You might say instead that the metric does not have a rich of semantic structure at all, because it is just a bit from a button press. For me, an impact metric that is defined as ‘not too different from the world state that already exists’ would also encode a huge amount of semantic structure, in case the world we are talking about is not a toy world but the real world.
It was not my intention to imply that semantic structure is never needed—I was just saying that the pedestrian example does not indicate the need for semantic structure. I would generally like to minimize the use of semantic structure in impact measures, but I agree it’s unlikely we can get away without it.
There are some kinds of semantic structure that the agent can learn without explicit human input, e.g. by observing how humans have arranged the world (as in the RLSP paper). I think it’s plausible that agents can learn the semantic structure that’s needed for impact measures through unsupervised learning about the world, without relying on human input. This information could be incorporated in the weights assigned to reaching different states or satisfying different utility functions by the deviation measure (e.g. states where pigeons / cats are alive).
Thanks for the clarification, I think our intuitions about how far you could take these techniques may be more similar than was apparent from the earlier comments.
You bring up the distinction between semantic structure that is learned via unsupervised learning, and semantic structure that comes from ‘explicit human input’. We may be using the term ‘semantic structure’ in somewhat different ways when it comes to the question of how much semantic structure you are actually creating in certain setups.
If you set up things to create an impact metric via unsupervised learning, you still need to encode some kind of impact metric on the world state by hand, to go into the agents’s reward function, e.g. you may encode ‘bad impact’ as the observable signal ‘the owner of the agent presses the do-not-like feedback button’. For me, that setup uses a form of indirection to create an impact metric that is incredibly rich in semantic structure. It is incredibly rich because it indirectly incorporates the impact-related semantic structure knowledge that is in the owner’s brain. You might say instead that the metric does not have a rich of semantic structure at all, because it is just a bit from a button press. For me, an impact metric that is defined as ‘not too different from the world state that already exists’ would also encode a huge amount of semantic structure, in case the world we are talking about is not a toy world but the real world.