Early work tends to be less relevant in the context of modern machine learning
I’m curious why you think the orthogonality thesis, instrumental convergence, the treacherous turn or Goodhart’s law arguments are less relevant in the context of modern machine learning. (We can use here Facebook’s feed-creation-algorithm as an example of modern machine learning, for the sake of concreteness.)
It seems to me that the orthogonality thesis doesn’t apply to modern machine learning? None of the techniques we know for training AI systems to general capabilities seem like they are compatible with creating a paperclip maximiser:
Unsupervised learning on human generated/curated data
Reinforcement learning from human feedback
Imitation Learning
This is especially the case if shard theory is true. Training an AI across diverse domains/tasks may of necessity result in the AI forming value shards useful/relevant to those tasks
That is even if we wanted to, we couldn’t necessarily create a generally intelligent paperclip maximiser.
We could create an agent that values paperclips (humans value paperclips to some extent), but most agents that value paperclips aren’t paperclip maximisers.
I’m curious why you think the orthogonality thesis, instrumental convergence, the treacherous turn or Goodhart’s law arguments are less relevant in the context of modern machine learning. (We can use here Facebook’s feed-creation-algorithm as an example of modern machine learning, for the sake of concreteness.)
It seems to me that the orthogonality thesis doesn’t apply to modern machine learning? None of the techniques we know for training AI systems to general capabilities seem like they are compatible with creating a paperclip maximiser:
Unsupervised learning on human generated/curated data
Reinforcement learning from human feedback
Imitation Learning
This is especially the case if shard theory is true. Training an AI across diverse domains/tasks may of necessity result in the AI forming value shards useful/relevant to those tasks
That is even if we wanted to, we couldn’t necessarily create a generally intelligent paperclip maximiser.
We could create an agent that values paperclips (humans value paperclips to some extent), but most agents that value paperclips aren’t paperclip maximisers.