habryka comments on Understanding and controlling a maze-solving policy network

habryka 4 Sep 2024 23:53 UTC
LW: 2 AF: 2
0
AF
Often people talk about policies getting “selected for” on the basis of maximizing reward. Then, inductive biases serve as “tie breakers” among the reward-maximizing policies.
Does anyone do this? Under this model the data-memorizing model would basically always win out, which I’ve never really seen anyone predict. Seems clear that inductive biases do more than tie-breaking.