I think that it’s mostly Eliezer who believes so strongly in utility functions. Nate Soares’ post Deep Deceptiveness, which I claim is a central part of the MIRI threat model insofar as there is one, doesn’t require an agent coherent enough to satisfy VNM over world-states. In fact it can depart from coherence in several ways and still be capable and dangerous:
It can flinch away from dangerous thoughts;
Its goals can drift over time;
Its preferences can be incomplete;
Maybe every 100 seconds it randomly gets distracted for 10 seconds
The important property it has is a goal about the real world and applies general problem-solving skills to achieve it, and has no stable desire to use its full intelligence to be helpful/good for humans. No one has formalized this and so no one has proved interesting things about such an agent model.
I would be somewhat surprised if Eliezer and Nate disagree very much here, though you might know better. So I would mostly see Nate’s post as clarification of both Eliezer’s and Nate’s views.
I do think they disagree based on my experience working with Nate and Vivek. Eliezer has said he has only shared 40% of his models with even Nate for infosec reasons [1] (which surprised me!), so it isn’t surprising to me that they would have different views. Though I don’t know Eliezer well, I think he does believe in the basic point of Deep Deceptiveness (because it’s pretty basic) but also believes in coherence/utility functions more than Nate does. I can maybe say more privately but if it’s important asking one of them is better.
[1] This was a while ago so he might have actually said that Nate only has 40% of his models. But either way my conclusion is valid.
I think that it’s mostly Eliezer who believes so strongly in utility functions. Nate Soares’ post Deep Deceptiveness, which I claim is a central part of the MIRI threat model insofar as there is one, doesn’t require an agent coherent enough to satisfy VNM over world-states. In fact it can depart from coherence in several ways and still be capable and dangerous:
It can flinch away from dangerous thoughts;
Its goals can drift over time;
Its preferences can be incomplete;
Maybe every 100 seconds it randomly gets distracted for 10 seconds
The important property it has is a goal about the real world and applies general problem-solving skills to achieve it, and has no stable desire to use its full intelligence to be helpful/good for humans. No one has formalized this and so no one has proved interesting things about such an agent model.
I would be somewhat surprised if Eliezer and Nate disagree very much here, though you might know better. So I would mostly see Nate’s post as clarification of both Eliezer’s and Nate’s views.
I do think they disagree based on my experience working with Nate and Vivek. Eliezer has said he has only shared 40% of his models with even Nate for infosec reasons [1] (which surprised me!), so it isn’t surprising to me that they would have different views. Though I don’t know Eliezer well, I think he does believe in the basic point of Deep Deceptiveness (because it’s pretty basic) but also believes in coherence/utility functions more than Nate does. I can maybe say more privately but if it’s important asking one of them is better.
[1] This was a while ago so he might have actually said that Nate only has 40% of his models. But either way my conclusion is valid.