It seems to me that the orthogonality thesis doesn’t apply to modern machine learning? None of the techniques we know for training AI systems to general capabilities seem like they are compatible with creating a paperclip maximiser:
Unsupervised learning on human generated/curated data
Reinforcement learning from human feedback
Imitation Learning
This is especially the case if shard theory is true. Training an AI across diverse domains/tasks may of necessity result in the AI forming value shards useful/relevant to those tasks
That is even if we wanted to, we couldn’t necessarily create a generally intelligent paperclip maximiser.
We could create an agent that values paperclips (humans value paperclips to some extent), but most agents that value paperclips aren’t paperclip maximisers.
It seems to me that the orthogonality thesis doesn’t apply to modern machine learning? None of the techniques we know for training AI systems to general capabilities seem like they are compatible with creating a paperclip maximiser:
Unsupervised learning on human generated/curated data
Reinforcement learning from human feedback
Imitation Learning
This is especially the case if shard theory is true. Training an AI across diverse domains/tasks may of necessity result in the AI forming value shards useful/relevant to those tasks
That is even if we wanted to, we couldn’t necessarily create a generally intelligent paperclip maximiser.
We could create an agent that values paperclips (humans value paperclips to some extent), but most agents that value paperclips aren’t paperclip maximisers.