The internal representations assumption was meant to be pretty broad, I didn’t mean that the network is explicitly representing a scalar reward function over observations or anything like that—e.g. these can be implicit representations of state features. I think this would also include the kind of representations you are assuming in the maze-solving post, e.g. cheese shards / circuits.
The internal representations assumption was meant to be pretty broad, I didn’t mean that the network is explicitly representing a scalar reward function over observations or anything like that—e.g. these can be implicit representations of state features. I think this would also include the kind of representations you are assuming in the maze-solving post, e.g. cheese shards / circuits.