OK, this whole conversation is being downvoted (by the same people?)
Fair enough, this is rather dragging on. I’ll try and wrap things up by addressing my own argument there.
What separates a precommitted!blackmailer from an honest bargainer in a standard acausal prisoner’s dilemma, offering to increase your utility by rescuing thousands of potential torture victims from the deathtrap created by another agent?
We want to avoid supporting agents that create problems for us. So nothing, if the honest agent shares a similar utility function to the torturer (and thus rewarding them is incentive for the torturer to arrange such a situation.)
Thus, creating such an honest agent (such as—importantly—by self-modifying in order to “precommit”) is subject to the same incentives as just blackmailing us normally.
I’ll try and wrap things up by addressing my own argument there.
I’ll join you by mostly agreeing and expressing a small difference in the way TDT-like reasoners may see the situation.
What separates a precommitted!blackmailer from an honest bargainer in a standard acausal prisoner’s dilemma, offering to increase your utility by rescuing thousands of potential torture victims from the deathtrap created by another agent?
We want to avoid supporting agents that create problems for us. So nothing, if the honest agent shares a similar utility function to the torturer (and thus rewarding them is incentive for the torturer to arrange such a situation.)
This is a good heuristic. It certainly handles most plausible situations. However in principle a TDT agent will make a distinction between the agent offering to rescue the torture victims for a payment. It will even pay an agent who just happens to value torturing folk to not torture folk. This applies even if these honest agents happen to have similar values to the UFAI/torturer.
The line I draw (and it is a tricky concept that is hard to express so I cannot hope to speak for other TDT-like thinkers) is not whether the values of the honest agent are similar to the UFAI’s. It is instead based on how that honest agent came to be.
If the honest torturer just happened to evolve that way (competitive social instincts plus a few mutations for psychopathy, etc) and had not been influence by a UFAI then I’ll bribe him to not torture people. If an identical honest torturer was created (or modified to) by the UFAI for the purpose of influence then it doesn’t get cooperation.
The above may seem arbitrary but the ‘elegant’ generalisation is something along the lines of always, for every decision, tracing a complete causal graph of the decision algorithms being interacted with directly or indirectly. That’s too complicated to calculate all the time and we can usually ignore it and just remember to treat intentionally created agents and self-modifications approximately the same as if the original agent was making their decision.
Thus, creating such an honest agent (such as—importantly—by self-modifying in order to “precommit”) is subject to the same incentives as just blackmailing us normally.
Precisely. (I have the same conclusion, just slightly different working out.)
As I understand it, technically, the distinction is whether torturers will realise they can get free utility from your trades and start torturing extra so the honest agents will trade more and receive rewards that also benefit the torturers, right?
Easily-made honest bargainers would just be the most likely of those situations; lots of wandering agents with the same utility function co-operating (acausally?) would be another. So the rule we would both apply is even the same, it just varies slightly different assumptions about the hypothetical scenario.
OK, this whole conversation is being downvoted (by the same people?)
Fair enough, this is rather dragging on. I’ll try and wrap things up by addressing my own argument there.
We want to avoid supporting agents that create problems for us. So nothing, if the honest agent shares a similar utility function to the torturer (and thus rewarding them is incentive for the torturer to arrange such a situation.)
Thus, creating such an honest agent (such as—importantly—by self-modifying in order to “precommit”) is subject to the same incentives as just blackmailing us normally.
I’ll join you by mostly agreeing and expressing a small difference in the way TDT-like reasoners may see the situation.
This is a good heuristic. It certainly handles most plausible situations. However in principle a TDT agent will make a distinction between the agent offering to rescue the torture victims for a payment. It will even pay an agent who just happens to value torturing folk to not torture folk. This applies even if these honest agents happen to have similar values to the UFAI/torturer.
The line I draw (and it is a tricky concept that is hard to express so I cannot hope to speak for other TDT-like thinkers) is not whether the values of the honest agent are similar to the UFAI’s. It is instead based on how that honest agent came to be.
If the honest torturer just happened to evolve that way (competitive social instincts plus a few mutations for psychopathy, etc) and had not been influence by a UFAI then I’ll bribe him to not torture people. If an identical honest torturer was created (or modified to) by the UFAI for the purpose of influence then it doesn’t get cooperation.
The above may seem arbitrary but the ‘elegant’ generalisation is something along the lines of always, for every decision, tracing a complete causal graph of the decision algorithms being interacted with directly or indirectly. That’s too complicated to calculate all the time and we can usually ignore it and just remember to treat intentionally created agents and self-modifications approximately the same as if the original agent was making their decision.
Precisely. (I have the same conclusion, just slightly different working out.)
As I understand it, technically, the distinction is whether torturers will realise they can get free utility from your trades and start torturing extra so the honest agents will trade more and receive rewards that also benefit the torturers, right?
Easily-made honest bargainers would just be the most likely of those situations; lots of wandering agents with the same utility function co-operating (acausally?) would be another. So the rule we would both apply is even the same, it just varies slightly different assumptions about the hypothetical scenario.