Often people talk about policies getting “selected for” on the basis of maximizing reward. Then, inductive biases serve as “tie breakers” among the reward-maximizing policies.
Does anyone do this? Under this model the data-memorizing model would basically always win out, which I’ve never really seen anyone predict. Seems clear that inductive biases do more than tie-breaking.
Does anyone do this? Under this model the data-memorizing model would basically always win out, which I’ve never really seen anyone predict. Seems clear that inductive biases do more than tie-breaking.