So, if I understand correctly, the way we would consider it likely that the correct generalisation had happened would be if the agent could generalise to hazards it had never seen actually kill chickens before? And this would require the agent to have an actual model of how chickens can be threatened such that it could predict that lava would destroy chickens based on, say, it’s knowledge that it will die if it jumps into lava, which is beyond capabilities at the moment?
Yes, that would be the desired generalization in the situations we checked. If that happens, we had specified a behavioral generalization property and then wrote down how we were going to get it, and then had just been right in predicting that that training rationale would go through.
So, if I understand correctly, the way we would consider it likely that the correct generalisation had happened would be if the agent could generalise to hazards it had never seen actually kill chickens before? And this would require the agent to have an actual model of how chickens can be threatened such that it could predict that lava would destroy chickens based on, say, it’s knowledge that it will die if it jumps into lava, which is beyond capabilities at the moment?
Yes, that would be the desired generalization in the situations we checked. If that happens, we had specified a behavioral generalization property and then wrote down how we were going to get it, and then had just been right in predicting that that training rationale would go through.