Jay Bailey comments on TurnTrout’s shortform feed

Jay Bailey 9 Oct 2022 11:18 UTC
2 points
Why is this difficult? Is it only difficult to do this in Challenge Mode—if you could just code in “Number of chickens” as a direct feed to the agent, can it be done then? I was thinking about this today, and got to wondering why it was hard—at what step does an experiment to do this fail?
- TurnTrout 10 Oct 2022 22:16 UTC
  2 points
  Parent
  Even if you can code in number of chickens as an input to the reward function, that doesn’t mean you can reliably get the agent to generalize to protect chickens. That input probably makes the task easier than in Challenge Mode, but not necessarily easy. The agent could generalize to some other correlate. Like ensuring there are no skeletons nearby (because they might shoot nearby chickens), but not in order to protect the chickens.
  - Jay Bailey 10 Oct 2022 23:50 UTC
    1 point
    Parent
    So, if I understand correctly, the way we would consider it likely that the correct generalisation had happened would be if the agent could generalise to hazards it had never seen actually kill chickens before? And this would require the agent to have an actual model of how chickens can be threatened such that it could predict that lava would destroy chickens based on, say, it’s knowledge that it will die if it jumps into lava, which is beyond capabilities at the moment?
    - TurnTrout 17 Oct 2022 18:56 UTC
      2 points
      Parent
      Yes, that would be the desired generalization in the situations we checked. If that happens, we had specified a behavioral generalization property and then wrote down how we were going to get it, and then had just been right in predicting that that training rationale would go through.