It’s unclear if you’ve ordered these in a particular way. How likely do you think they each are? My ordering from most to least likely would probably be:
Inductive bias toward long-term goals
Meta-learning
Implicitly non-myopic objective functions
Simulating humans
Non-myopia enables deceptive alignment
(Acausal) trade
Why do you think this:
Non-myopia is interesting because it indicates a flaw in training – somehow our AI has started to care about something we did not design it to care about.
Who says we don’t want non-myopia, those safety people?! I guess to me it looks like the most likely reason we get non-myopia is that we don’t try that hard not to. This would be some combination of Meta-learning, Inductive bias toward long-term goals, and Implicitly non-myopic objective functions, as well as potentially “Training for non-myopia”.
Who says we don’t want non-myopia, those safety people?!
It seems like a lot of people would expect myopia by default since the training process does nothing to incentivize non-myopia. “Why would the model care about what happens after an episode if there it does not get rewarded for it?” I think skepticism about non-myopia is a reason ML people are often skeptical of deceptive alignment concerns.
Another reason to expect myopia by default is that – to my knowledge – nobody has shown non-myopia occurring without meta-learning being applied.
It’s unclear if you’ve ordered these in a particular way. How likely do you think they each are? My ordering from most to least likely would probably be:
Inductive bias toward long-term goals
Meta-learning
Implicitly non-myopic objective functions
Simulating humans
Non-myopia enables deceptive alignment
(Acausal) trade
Why do you think this:
Who says we don’t want non-myopia, those safety people?! I guess to me it looks like the most likely reason we get non-myopia is that we don’t try that hard not to. This would be some combination of Meta-learning, Inductive bias toward long-term goals, and Implicitly non-myopic objective functions, as well as potentially “Training for non-myopia”.
It seems like a lot of people would expect myopia by default since the training process does nothing to incentivize non-myopia. “Why would the model care about what happens after an episode if there it does not get rewarded for it?” I think skepticism about non-myopia is a reason ML people are often skeptical of deceptive alignment concerns.
Another reason to expect myopia by default is that – to my knowledge – nobody has shown non-myopia occurring without meta-learning being applied.
Your ordering seems reasonable! My ordering in the post is fairly arbitrary. My goal was mostly to put easier examples early on.