To answer the first question, it’s because no one will actually create/train a paperclip maximiser. The scenario holds water if such an AI is created, but none will be.
People scrutinising that hypothetical and rightly dismissing it may overupdate on AI risk not being a serious concern. It’s a problem if the canonical thought experiment of AI misalignment is not very realistic/easily dismissed.
It probably stuck around because of ~founder effects.
Do you mean that no one will actually create exactly a paperclips maximizer or no agent of that kind? I.e. with goals such as “collect stamps”, or “generate images”? Because I think Eliezer meant to object to that class of examples, rather than only that specific one, but I’m not sure.
We probably wouldn’t uncritically let loose an AI whose capability was to maximise the quantity of some physical stuff (paperclips, stamps, etc.). If we make a (very) stupid outer alignment failure, we’re more likely to train an AI to maximise “happiness” or similar.
I agree with you here, although something like “predict the next token” seems more and more likely. Although I’m not sure if this is in the same class of goals as paperclip maximizing in this context, and if the kind of failure it could lead to would be similar or not.
To answer the first question, it’s because no one will actually create/train a paperclip maximiser. The scenario holds water if such an AI is created, but none will be.
People scrutinising that hypothetical and rightly dismissing it may overupdate on AI risk not being a serious concern. It’s a problem if the canonical thought experiment of AI misalignment is not very realistic/easily dismissed.
It probably stuck around because of ~founder effects.
Do you mean that no one will actually create exactly a paperclips maximizer or no agent of that kind? I.e. with goals such as “collect stamps”, or “generate images”? Because I think Eliezer meant to object to that class of examples, rather than only that specific one, but I’m not sure.
We probably wouldn’t uncritically let loose an AI whose capability was to maximise the quantity of some physical stuff (paperclips, stamps, etc.). If we make a (very) stupid outer alignment failure, we’re more likely to train an AI to maximise “happiness” or similar.
I agree with you here, although something like “predict the next token” seems more and more likely. Although I’m not sure if this is in the same class of goals as paperclip maximizing in this context, and if the kind of failure it could lead to would be similar or not.