Doomers worry about AIs developing “misaligned” values. But in this scenario, the “values” implicit in AI actions are roughly chosen by the organisations who make them and by the customers who use them.
There is reason to think “roughly” aligned isn’t enough in the case of a sufficiently capable system.
Second, Robin’s statement seems to ignore (or contradict without making an argument) the fact that even if it is true for systems not as smart as humans, there may be a “sharp left turn” at some point where, in Nate Soares’ words, “as systems start to work really well in domains really far beyond the environments of their training” “it’s predictably the case that the alignment of the system will fail to generalize with it.”
There is reason to think “roughly” aligned isn’t enough in the case of a sufficiently capable system.
Second, Robin’s statement seems to ignore (or contradict without making an argument) the fact that even if it is true for systems not as smart as humans, there may be a “sharp left turn” at some point where, in Nate Soares’ words, “as systems start to work really well in domains really far beyond the environments of their training” “it’s predictably the case that the alignment of the system will fail to generalize with it.”