the point i was trying to make is that if you expect someone to reliably implement LDT, then you can expect to be rewarded for help them (actually helping them) solve alignment because they’d be the kind of agent who, if they solve alignment is solved, will retroactively allocate some of their utility function handshake to you.
LDT-ers reliably one-box, and LDT-ers reliably retroactively-reward people who help them, including in ways that they can’t percieve before alignment is solved.
it’s not about “doing something nice”, it’s about LDT agents who end do well, retroactively repaying the agents who helped them get there, because being the kind of agent who reliably does that causes them to more often do well.
The point i was trying to make is that if you expect someone to reliably implement LDT, then you can expect to be rewarded for help them because they’d be the kind of agent who, if they solve alignment is solved, will retroactively allocate some of their utility function handshake to you.
Yes, and the point I am making is that this is not what LDT is or how it works. LDT agents perform prudentbot, not fairbot. An AGI will only reward you with cooperation if you conditionally cooperate, on something you’re unable to “condition” on because it would mean looking at the AGI’s code and analyzing it beyond what anyone is capable of at present.
i have read that post before and i do not think that it applies here? can you please expand on your disagreement?
Tamsin Leake does not have the kind of info on who/what will control the lightcone that would allow them to cooperate in PDs.
you don’t need to know this to probabilistically-help whoever will control the lightcone, right? if you take actions that help them-whoever-they-are, then you’re getting some of that share from them-whoever-they-are. (i think?)
you don’t need to know this to probabilistically-help whoever will control the lightcone, right? if you take actions that help them-whoever-they-are, then you’re getting some of that share from them-whoever-they-are. (i think?)
My point is not that you can’t affect the outcome of the future. That may also be impossible, but regardless, any intervention you make will be independent of whether or not the person you’re rewarding gives you a share of the lightcone. You can’t actually tell in advance whether or not that AI/person is going to give you that share, in the sense that would incentivize someone to give it to you after they’ve already seized control.
you don’t think there are humans whom i can expect to reliably reward-me-as-per-LDT after-the-fact? it doesn’t have to be a certainty, i can merely have some confidence that some person will give me that share, and weigh the action based on that confidence.
That might happen, but they wouldn’t be doing it because they’re maximizing their utility via acausal trade, they’d be doing it because they value reciprocity.
why wouldn’t it be because they’re maximizing their utility via acausal trade?
do you also think people who don’t-intrinsically-value-reciprocity are doomed to never get picked up by rational agents in parfit’s hitchhiker? or doomed to two-box in newcomb?
to take an example: i would expect that even if he didn’t value reciprocity at all, yudkowsky would reliably cooperate as the hitchhiker in parfit’s hitchhiker, or one-box in newcomb, or retroactively-give-utility-function-shares-to-people-who-helped-if-he-grabbed-the-lightcone. he seems like the-kind-of-person-who-tries-to-reliably-implement-LDT.
the point i was trying to make is that if you expect someone to reliably implement LDT, then you can expect to be rewarded for help them (actually helping them) solve alignment because they’d be the kind of agent who, if they solve alignment is solved, will retroactively allocate some of their utility function handshake to you.
LDT-ers reliably one-box, and LDT-ers reliably retroactively-reward people who help them, including in ways that they can’t percieve before alignment is solved.
it’s not about “doing something nice”, it’s about LDT agents who end do well, retroactively repaying the agents who helped them get there, because being the kind of agent who reliably does that causes them to more often do well.
Yes, and the point I am making is that this is not what LDT is or how it works. LDT agents perform prudentbot, not fairbot. An AGI will only reward you with cooperation if you conditionally cooperate, on something you’re unable to “condition” on because it would mean looking at the AGI’s code and analyzing it beyond what anyone is capable of at present.
i have read that post before and i do not think that it applies here? can you please expand on your disagreement?
you don’t need to know this to probabilistically-help whoever will control the lightcone, right? if you take actions that help them-whoever-they-are, then you’re getting some of that share from them-whoever-they-are. (i think?)
My point is not that you can’t affect the outcome of the future. That may also be impossible, but regardless, any intervention you make will be independent of whether or not the person you’re rewarding gives you a share of the lightcone. You can’t actually tell in advance whether or not that AI/person is going to give you that share, in the sense that would incentivize someone to give it to you after they’ve already seized control.
you don’t think there are humans whom i can expect to reliably reward-me-as-per-LDT after-the-fact? it doesn’t have to be a certainty, i can merely have some confidence that some person will give me that share, and weigh the action based on that confidence.
That might happen, but they wouldn’t be doing it because they’re maximizing their utility via acausal trade, they’d be doing it because they value reciprocity.
why wouldn’t it be because they’re maximizing their utility via acausal trade?
do you also think people who don’t-intrinsically-value-reciprocity are doomed to never get picked up by rational agents in parfit’s hitchhiker? or doomed to two-box in newcomb?
to take an example: i would expect that even if he didn’t value reciprocity at all, yudkowsky would reliably cooperate as the hitchhiker in parfit’s hitchhiker, or one-box in newcomb, or retroactively-give-utility-function-shares-to-people-who-helped-if-he-grabbed-the-lightcone. he seems like the-kind-of-person-who-tries-to-reliably-implement-LDT.