If I have a system which is manifestly maximizing a utility function I can understand, then I have some hope of proving claims about its behavior. I believe that designing such a system is possible, but right now it seems way out of our depth.
Every example of intelligence we have ever seen behaves in an incredibly complicated way, which isn’t manifestly maximizing any utility function at all. The way things are going, it looks like the first AI humans come up with (if they come up with one in the next 50 years at least) is probably going to be similarly difficult to analyze formally.
I think someone who worked on friendliness would agree, but I don’t know. I have only started thinking about these issues recently, so I may be way off base.
If I have a system which is manifestly maximizing a utility function I can understand, then I have some hope of proving claims about its behavior. I believe that designing such a system is possible, but right now it seems way out of our depth.
Every example of intelligence we have ever seen behaves in an incredibly complicated way, which isn’t manifestly maximizing any utility function at all. The way things are going, it looks like the first AI humans come up with (if they come up with one in the next 50 years at least) is probably going to be similarly difficult to analyze formally.
I think someone who worked on friendliness would agree, but I don’t know. I have only started thinking about these issues recently, so I may be way off base.