There are several problems with this argument, firstly the AI has code describing its goal. It would seem much easier to copy this code across than to turn our moral values into code. Secondly the AI doesn’t have to be confident in getting it right. A paperclipping AI has two options, it can work at a factory and make a few paperclips, or it can self improve, but it has a chance of the resultant AI won’t maximize paperclips. However the amount of paperclips it could produce if it successfully self improves is astronomically vast. If its goal function is linear in paperclips, it will self improve if it thinks it has any chance of getting it right. If it fails at preserving its values as its self improving then the result looks like a staple maximizer.
Humans (at least the sort thinking about AI) know that we all have roughly similar values, so if you think you might have solved alignment, but aren’t sure, it makes sense to ask for others to help you, to wait for someone else to finish solving it.
However a paperclipping AI would know that no other AI’s had its goal function. If it doesn’t build a paperclipping super-intelligence, no one else is going to. It will therefore try to do so even if unlikely to succeed.
There are several problems with this argument, firstly the AI has code describing its goal. It would seem much easier to copy this code across than to turn our moral values into code.
“Has code” and “has code that’s in an unpluggable and reusable module” are two different things.
There are several problems with this argument, firstly the AI has code describing its goal. It would seem much easier to copy this code across than to turn our moral values into code. Secondly the AI doesn’t have to be confident in getting it right. A paperclipping AI has two options, it can work at a factory and make a few paperclips, or it can self improve, but it has a chance of the resultant AI won’t maximize paperclips. However the amount of paperclips it could produce if it successfully self improves is astronomically vast. If its goal function is linear in paperclips, it will self improve if it thinks it has any chance of getting it right. If it fails at preserving its values as its self improving then the result looks like a staple maximizer.
Humans (at least the sort thinking about AI) know that we all have roughly similar values, so if you think you might have solved alignment, but aren’t sure, it makes sense to ask for others to help you, to wait for someone else to finish solving it.
However a paperclipping AI would know that no other AI’s had its goal function. If it doesn’t build a paperclipping super-intelligence, no one else is going to. It will therefore try to do so even if unlikely to succeed.
“Has code” and “has code that’s in an unpluggable and reusable module” are two different things.