Speaking for the Youtube Recommender pessimists, the problem I see is first training data, and second the human overseers.
Training just on video names and origins, along with watch statistics across all users, doesn’t seem to reward clever planning in the physical world until after the AI is generally intelligent. This is the exact opposite of how you’d design an environment to encourage general real-world planning. (Curriculum learning is a thing for a reason)
Second, the people training the thing aren’t trying to make AGI. They’re totally fine with merely extracting the surface-level regularities of the data rather than the structure of the physical world, so they won’t keep jamming in more resources after it plateaus. If they’re at all worried about planning, they’ll make the reward function myopic (though perhaps they should already be doing this but aren’t because $$$, which is concerning).
Speaking for the Youtube Recommender pessimists, the problem I see is first training data, and second the human overseers.
Training just on video names and origins, along with watch statistics across all users, doesn’t seem to reward clever planning in the physical world until after the AI is generally intelligent. This is the exact opposite of how you’d design an environment to encourage general real-world planning. (Curriculum learning is a thing for a reason)
Second, the people training the thing aren’t trying to make AGI. They’re totally fine with merely extracting the surface-level regularities of the data rather than the structure of the physical world, so they won’t keep jamming in more resources after it plateaus. If they’re at all worried about planning, they’ll make the reward function myopic (though perhaps they should already be doing this but aren’t because $$$, which is concerning).