Interesting. I realise now ‘one shot’ is an overloaded term and perhaps a poor choice. I’m referring to ‘one action’, ‘one chance’, rather than ‘one training/prompt example’ which is how ‘one shot’ often gets used in ML.
The typical chess AI (or other boardgame-playing RL algorithm) is episode-myopic. Or at least, its training regime is only explicitly incentivising returns over a single episode (e.g. policy gradient or value-based training pressures) - and I don’t think we have artefacts yet which reify their goals in a way where it’s possible to misgeneralise to non-myopia. It’s certainly not action-myopic though (this is the whole point of training to maximise return—aggregate reward—vs single-step reward).
I’m not sure what it would mean entirely for an actor-moment to be myopic, but I imagine it would at minimum have to be ‘indifferent’ somehow to the presence or absence of relevantly-similar actor-moments in the future.
Interesting. I realise now ‘one shot’ is an overloaded term and perhaps a poor choice. I’m referring to ‘one action’, ‘one chance’, rather than ‘one training/prompt example’ which is how ‘one shot’ often gets used in ML.
The typical chess AI (or other boardgame-playing RL algorithm) is episode-myopic. Or at least, its training regime is only explicitly incentivising returns over a single episode (e.g. policy gradient or value-based training pressures) - and I don’t think we have artefacts yet which reify their goals in a way where it’s possible to misgeneralise to non-myopia. It’s certainly not action-myopic though (this is the whole point of training to maximise return—aggregate reward—vs single-step reward).
I’m not sure what it would mean entirely for an actor-moment to be myopic, but I imagine it would at minimum have to be ‘indifferent’ somehow to the presence or absence of relevantly-similar actor-moments in the future.