The broad spirit they want to convey with the word “generalisation”, which is that two systems can exhibit the same desired behaviour in training but result in completely different goals in testing or deployment, seems fair as the general problem. But I agree that to generalise can give the impression that it’s an “intentional act of extrapolation”, to create a model that is consistent with a certain specification. And there are many more ways in which the AI can behave well in training and not in deployment, without need to assume it’s extrapolating a model.
And since two systems can tell jokes in training when the specification is to make people happy, and one end up pumping people with opioids and the other having no consideration for happiness, then any of these or other failure modes could happen despite being sure their behaviours were consistent with the programmers’ goal in training.
The broad spirit they want to convey with the word “generalisation”, which is that two systems can exhibit the same desired behaviour in training but result in completely different goals in testing or deployment, seems fair as the general problem. But I agree that to generalise can give the impression that it’s an “intentional act of extrapolation”, to create a model that is consistent with a certain specification. And there are many more ways in which the AI can behave well in training and not in deployment, without need to assume it’s extrapolating a model.
And since two systems can tell jokes in training when the specification is to make people happy, and one end up pumping people with opioids and the other having no consideration for happiness, then any of these or other failure modes could happen despite being sure their behaviours were consistent with the programmers’ goal in training.