Who says we don’t want non-myopia, those safety people?!
It seems like a lot of people would expect myopia by default since the training process does nothing to incentivize non-myopia. “Why would the model care about what happens after an episode if there it does not get rewarded for it?” I think skepticism about non-myopia is a reason ML people are often skeptical of deceptive alignment concerns.
Another reason to expect myopia by default is that – to my knowledge – nobody has shown non-myopia occurring without meta-learning being applied.
It seems like a lot of people would expect myopia by default since the training process does nothing to incentivize non-myopia. “Why would the model care about what happens after an episode if there it does not get rewarded for it?” I think skepticism about non-myopia is a reason ML people are often skeptical of deceptive alignment concerns.
Another reason to expect myopia by default is that – to my knowledge – nobody has shown non-myopia occurring without meta-learning being applied.