My impression is that nowadays, the alignment problem seems more focused on something which I would describe as “teach the AI to follow any goal at all”, as if the goal with which we should align the AI with doesn’t matter as much from a research perspective.
Could someone provide some insights into the reasons for this? Or are my impressions wrong and I hallucinated the shift?
I’ll try to answer this since no one else has yet, but I’m not super confident in my answer. You’re accurately summarizing a shift and it’s about learning to walk before you learn to run. If we can’t “align” an AI to optimizing the number of paperclips (say), then we surely can’t align it to human values. See the concept of ‘mesa optimizers’ for some relevant ideas. I think this used to be thought of as not such an issue since traditional AI developments like Deep Blue had no difficulties getting an AI to follow a prescribed goal of “win at Chess” while modern ML methods make this issue more obvious.
In older texts on AI alignment, there seems quite some discussion on how to learn human values, like here:
https://ai-alignment.com/the-easy-goal-inference-problem-is-still-hard-fad030e0a876
My impression is that nowadays, the alignment problem seems more focused on something which I would describe as “teach the AI to follow any goal at all”, as if the goal with which we should align the AI with doesn’t matter as much from a research perspective.
Could someone provide some insights into the reasons for this? Or are my impressions wrong and I hallucinated the shift?
I’ll try to answer this since no one else has yet, but I’m not super confident in my answer. You’re accurately summarizing a shift and it’s about learning to walk before you learn to run. If we can’t “align” an AI to optimizing the number of paperclips (say), then we surely can’t align it to human values. See the concept of ‘mesa optimizers’ for some relevant ideas. I think this used to be thought of as not such an issue since traditional AI developments like Deep Blue had no difficulties getting an AI to follow a prescribed goal of “win at Chess” while modern ML methods make this issue more obvious.