TurnTrout comments on TurnTrout’s shortform feed

TurnTrout 10 Oct 2022 22:16 UTC
2 points
Even if you can code in number of chickens as an input to the reward function, that doesn’t mean you can reliably get the agent to generalize to protect chickens. That input probably makes the task easier than in Challenge Mode, but not necessarily easy. The agent could generalize to some other correlate. Like ensuring there are no skeletons nearby (because they might shoot nearby chickens), but not in order to protect the chickens.
- Jay Bailey 10 Oct 2022 23:50 UTC
  1 point
  Parent
  So, if I understand correctly, the way we would consider it likely that the correct generalisation had happened would be if the agent could generalise to hazards it had never seen actually kill chickens before? And this would require the agent to have an actual model of how chickens can be threatened such that it could predict that lava would destroy chickens based on, say, it’s knowledge that it will die if it jumps into lava, which is beyond capabilities at the moment?
  - TurnTrout 17 Oct 2022 18:56 UTC
    2 points
    Parent
    Yes, that would be the desired generalization in the situations we checked. If that happens, we had specified a behavioral generalization property and then wrote down how we were going to get it, and then had just been right in predicting that that training rationale would go through.