I think that people who imagine “tracking progress using signals integrated with other signals” feels anything like happiness feels inside to them—while taking that imagination and also loudly insisting that it will be very alien happiness or much simpler happiness or whatever—are simply making a mistake-of-fact, and I am just plain skeptical that there is a real values difference that would survive their learning what I know about how minds and qualia work. I of course fully expect that these people will loudly proclaim that I could not possibly know anything they don’t, despite their own confusion about these matters that they lack the skill to reflect on as confusion, and for them to exchange some wise smiles about those silly people who think that people disagree because of mistakes rather than values differences.
Trade opportunities are unfortunately ruled out by our inability to model those minds well enough that, if some part of them decided to seize an opportunity to Defect, we would’ve seen it coming in the past and counter-Defected. If we Cooperate, we’ll be nothing but CooperateBot, and they, I’m afraid, will be PrudentBot, not FairBot.
and they, I’m afraid, will be PrudentBot, not FairBot.
This shouldn’t matter for anyone besides me, but there’s something personally heartbreaking about seeing the one bit of research for which I feel comfortable claiming a fraction of a point of dignity, being mentioned validly to argue why decision theory won’t save us.
(Modal bargaining agents didn’t turn out to be helpful, but given the state of knowledge at that time, it was worth doing.)
It would be dying with a lot less dignity if everyone on Earth—not just the managers of the AGI company making the decision to kill us—thought that all you needed to do was be CooperateBot, and had no words for any sharper concepts than that. Thank you for that, Patrick.
To clarify, you mean “mistake-of-fact” in the sense that maybe the same people would use for other high-level concepts? Because if you use low enough resolution, happiness is like “tracking progress using signals integrated with other signals”, and so it is at least not inconsistent to save this part of your utility function using such low resolution.
I think that people who imagine “tracking progress using signals integrated with other signals” feels anything like happiness feels inside to them—while taking that imagination and also loudly insisting that it will be very alien happiness or much simpler happiness or whatever—are simply making a mistake-of-fact, and I am just plain skeptical that there is a real values difference that would survive their learning what I know about how minds and qualia work. I of course fully expect that these people will loudly proclaim that I could not possibly know anything they don’t, despite their own confusion about these matters that they lack the skill to reflect on as confusion, and for them to exchange some wise smiles about those silly people who think that people disagree because of mistakes rather than values differences.
Trade opportunities are unfortunately ruled out by our inability to model those minds well enough that, if some part of them decided to seize an opportunity to Defect, we would’ve seen it coming in the past and counter-Defected. If we Cooperate, we’ll be nothing but CooperateBot, and they, I’m afraid, will be PrudentBot, not FairBot.
This shouldn’t matter for anyone besides me, but there’s something personally heartbreaking about seeing the one bit of research for which I feel comfortable claiming a fraction of a point of dignity, being mentioned validly to argue why decision theory won’t save us.
(Modal bargaining agents didn’t turn out to be helpful, but given the state of knowledge at that time, it was worth doing.)
Sorry.
It would be dying with a lot less dignity if everyone on Earth—not just the managers of the AGI company making the decision to kill us—thought that all you needed to do was be CooperateBot, and had no words for any sharper concepts than that. Thank you for that, Patrick.
But sorry anyways.
To clarify, you mean “mistake-of-fact” in the sense that maybe the same people would use for other high-level concepts? Because if you use low enough resolution, happiness is like “tracking progress using signals integrated with other signals”, and so it is at least not inconsistent to save this part of your utility function using such low resolution.