I’ll try to answer this since no one else has yet, but I’m not super confident in my answer. You’re accurately summarizing a shift and it’s about learning to walk before you learn to run. If we can’t “align” an AI to optimizing the number of paperclips (say), then we surely can’t align it to human values. See the concept of ‘mesa optimizers’ for some relevant ideas. I think this used to be thought of as not such an issue since traditional AI developments like Deep Blue had no difficulties getting an AI to follow a prescribed goal of “win at Chess” while modern ML methods make this issue more obvious.
I’ll try to answer this since no one else has yet, but I’m not super confident in my answer. You’re accurately summarizing a shift and it’s about learning to walk before you learn to run. If we can’t “align” an AI to optimizing the number of paperclips (say), then we surely can’t align it to human values. See the concept of ‘mesa optimizers’ for some relevant ideas. I think this used to be thought of as not such an issue since traditional AI developments like Deep Blue had no difficulties getting an AI to follow a prescribed goal of “win at Chess” while modern ML methods make this issue more obvious.