This post really brings to light an inkling I had a while ago: TDT feels vaguely Kantian.
Compare:
“Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”
“Act only according to that maxim whereby you can at the same time will that it should become a universal law without contradiction.”
Now, they’re clearly not the same but they are similar enough that we shouldn’t be surprised that consequentialism under TDT alleviates some of our concerns about traditional consequentialism. I find this exciting—but it also makes me suspicious. Kantian moral theory has some serious problems and so I wonder if there might be analogous issues in CON+TDT. And I see some. I’ll leave out the Kantian equivalent unless someone is interested:
“What happens in general if everyone at least as smart as me deduces that I would do X whenever I’m in situation Y”?
The problem is that no two situations are strictly speaking identical (putting aside exact simulations and other universes). That means CON+TDT doesn’t prohibit a decision to carve up a vagrant for organs conditional on some unique feature of the situation. Or put it this way: can TDT correctly model intuitions about moral scenarios that cannot plausibly be replicated?
There are lots of possible agents that closely resemble TDT agents but with slight differences that would make them difficult to model. It seems like absent complete transparency rational agents should publicly endorse and abide by TDT but deviate to the extent they can get away with it.
The relevant scenarios can be too broad as well. For example, say you think a violent response to Hitler was justified. Unless Hitler is resurrected that algorithm won’t get used again (save simulations and other universes). But if we pick something broader that can be multiply instantiated we can run into trouble- “Violent response to leaders we disagree with is justified.”
So is there a rigorous way of determining classes of decisions?
You’ve identified a subtle problem with implementing decision theories, but the answer that the commonsense version of TDT “should” give is pretty clear: if the differences between two situations don’t massively affect the utilities involved (from the perspective of the deciding agent), then they should belong to the same reference class.
If you shouldn’t kill the patient in a house, then you still shouldn’t kill him with a mouse, or on a boat, with a goat, on a train, in the rain, here or there, or anywhere.
this is somewhat circular. It works for the example but not knowing how similar two situations have to be before similar decisions produce similar utility is part of the problem.
TDT seems to sometimes fail this, depending on whether the other scenario counts as the ‘same computation’, whatever that means. UDT postulates the existence of a ‘mathematical intuition module’ that can tell which computations influence which others, but it is not known how such a module could be created. Developing one would probably constitute a large fraction of the difficulty of creating AGI.
That means CON+TDT doesn’t prohibit a decision to carve up a vagrant for organs conditional on some unique feature of the situation.
Provided that the unique feature is relevant, no it does not. For example, if the vagrant’s parts were capable of saving 1,000 lives (a very unlikely situation, and not one anyone needs to worry of finding themself in) that would be a relevant unique feature.
However merely noticing that the vagrant is wearing a red baseball cap, made in 1953, and has $1.94 in their left pants pocket; while unique, is irrelevant. And as such it is easily modelled by using the protocol “insert random, irrelevant, unique aspect”.
No disagreement about relevance of baseball caps for organ transplantations, but if TDT is defined using “all other instantiations and simulations of that computation”, any small difference, however irrelevant, may exclude the agent from the category of instantiations of the same computation. The obvious countermeasure would be to ask TDT to include outputs not only of other instantiations of itself, but of a broader class of agents which behave similarly in all relevant aspects (in given situation). Which leads to the question how to precisely define “relevant”, which, as far as I understand, is the parent comment asking.
This post really brings to light an inkling I had a while ago: TDT feels vaguely Kantian.
Compare:
“Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”
“Act only according to that maxim whereby you can at the same time will that it should become a universal law without contradiction.”
Now, they’re clearly not the same but they are similar enough that we shouldn’t be surprised that consequentialism under TDT alleviates some of our concerns about traditional consequentialism. I find this exciting—but it also makes me suspicious. Kantian moral theory has some serious problems and so I wonder if there might be analogous issues in CON+TDT. And I see some. I’ll leave out the Kantian equivalent unless someone is interested:
The problem is that no two situations are strictly speaking identical (putting aside exact simulations and other universes). That means CON+TDT doesn’t prohibit a decision to carve up a vagrant for organs conditional on some unique feature of the situation. Or put it this way: can TDT correctly model intuitions about moral scenarios that cannot plausibly be replicated?
There are lots of possible agents that closely resemble TDT agents but with slight differences that would make them difficult to model. It seems like absent complete transparency rational agents should publicly endorse and abide by TDT but deviate to the extent they can get away with it.
The relevant scenarios can be too broad as well. For example, say you think a violent response to Hitler was justified. Unless Hitler is resurrected that algorithm won’t get used again (save simulations and other universes). But if we pick something broader that can be multiply instantiated we can run into trouble- “Violent response to leaders we disagree with is justified.”
So is there a rigorous way of determining classes of decisions?
You’ve identified a subtle problem with implementing decision theories, but the answer that the commonsense version of TDT “should” give is pretty clear: if the differences between two situations don’t massively affect the utilities involved (from the perspective of the deciding agent), then they should belong to the same reference class.
If you shouldn’t kill the patient in a house, then you still shouldn’t kill him with a mouse, or on a boat, with a goat, on a train, in the rain, here or there, or anywhere.
this is somewhat circular. It works for the example but not knowing how similar two situations have to be before similar decisions produce similar utility is part of the problem.
TDT seems to sometimes fail this, depending on whether the other scenario counts as the ‘same computation’, whatever that means. UDT postulates the existence of a ‘mathematical intuition module’ that can tell which computations influence which others, but it is not known how such a module could be created. Developing one would probably constitute a large fraction of the difficulty of creating AGI.
Provided that the unique feature is relevant, no it does not. For example, if the vagrant’s parts were capable of saving 1,000 lives (a very unlikely situation, and not one anyone needs to worry of finding themself in) that would be a relevant unique feature.
However merely noticing that the vagrant is wearing a red baseball cap, made in 1953, and has $1.94 in their left pants pocket; while unique, is irrelevant. And as such it is easily modelled by using the protocol “insert random, irrelevant, unique aspect”.
No disagreement about relevance of baseball caps for organ transplantations, but if TDT is defined using “all other instantiations and simulations of that computation”, any small difference, however irrelevant, may exclude the agent from the category of instantiations of the same computation. The obvious countermeasure would be to ask TDT to include outputs not only of other instantiations of itself, but of a broader class of agents which behave similarly in all relevant aspects (in given situation). Which leads to the question how to precisely define “relevant”, which, as far as I understand, is the parent comment asking.