Suppose the “simple” solution doesn’t have the problems you mention. Somehow we get our hands on a human that doesn’t have security holes and can’t go insane. I still don’t think it works.
Let’s say you are trying to do some probabilistic reasoning about the mathematical object “foobar” and the definition of it you’re given is “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’”, where X is an algorithmic description of yourself. Well, as soon as you realize that X is actually a simulation of you, you can conclude that you can say anything about ‘foobar’ and be right. So why bother doing any more probabilistic reasoning? Just say anything, or nothing. What kind of probabilistic reasoning can do you beyond that, even if you wanted to?
I think you’re collapsing some levels here, but it’s making my head hurt to think about it, having the definition-deriver and the subject be the same person.
Making this concrete: let ‘foobar’ refer to the set {1, 2, 3} in a shared language used by us and our subject, Alice. Alice would agree that it is true that “foobar = what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” where X is some algorithmic description of Alice. She would say something like “foobar = {1, 2, 3}, X would say {1, 2, 3}, {1, 2, 3} = {1, 2, 3} so this all checks out.”
Clearly then, any procedure that correctly determines what X would say about ‘foobar’ should result in the correct definition of foobar, namely {1, 2, 3}. This is what theoretically lets our “simple” solution work.
However, Alice would not agree that “what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is a correct definition of ‘foobar’. The issue is that this definition has the wrong properties when we consider counterfactuals concerning X. It is in fact the case that foobar is {1, 2, 3}, and further that ‘foobar’ means {1, 2, 3} in our current language, as stipulated at the beginning of this thought experiment. If-counterfactually X would say ‘{4, 5, 6}‘, foobar is still {1, 2, 3}, because what we mean by ‘foobar’ is {1, 2, 3} and {1, 2, 3} is {1, 2, 3} regardless of what X says.
Having written that, I now think I can return to your question. The answer is that firstly, by replacing the true definition “foobar = {1, 2, 3}” with “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” in the subject’s mind, you have just deleted the only reference to foobar that actually exists in the thought experiment. The subject has to reason about ‘foobar’ using their built in definition, since that is the only thing that actually points directly to the target object.
Secondly, as described above “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is an inaccurate definition of foobar when considering counterfactuals concerning what X would say about foobar. Which is exactly what you are doing when reasoning that “if-counterfactually I say {4, 5, 6} about foobar, then what X would say about ‘foobar’ is {4, 5, 6}, so {4, 5, 6} is correct.”
Which is to say that, analogising, the contents of our subject’s head is a pointer (in the programming sense) to the object itself, while “what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is a pointer to the first pointer. You can dereference it, and get the right answer, but you can’t just substitute it in for the first pointer. That gives you nothing but a pointer referring to itself.
ETA: Dear god, this turned into a long post. Sorry! I don’t think I can shorten it without making it worse though.
Right, so my point is that if your theory (that moral reasoning is probabilistic reasoning about some mathematical object) is to be correct, we need a definition of morality as a mathematical object which isn’t “what X says after considering all possible moral arguments”. So what could it be then? What definition Y can we give, such that it makes sense to say “when we reason about morality, we are really doing probabilistic reasoning about the mathematical object Y”?
Secondly, until we have a candidate definition Y at hand, we can’t show that moral reasoning really does correspond to probabilistic logical reasoning about Y. (And we’d also have to first understand what “probabilistic logical reasoning” is.) So, at this point, how can we be confident that moral reasoning does correspond to probabilistic logical reasoning about anything mathematical, and isn’t just some sort of random walk or some sort of reasoning that’s different from probabilistic logical reasoning?
Right, so my point is that if your theory (that moral reasoning is probabilistic reasoning about some mathematical object) is to be correct, we need a definition of morality as a mathematical object which isn’t “what X says after considering all possible moral arguments”. So what could it be then? What definition Y can we give, such that it makes sense to say “when we reason about morality, we are really doing probabilistic reasoning about the mathematical object Y”?
Unfortunately I doubt I can give you a short direct definition of morality. However if such a mathematical object exists, “what X says after considering all possible moral arguments” should be enough to pin it down (disregarding the caveats to do with our subject going insane, etc).
Secondly, until we have a candidate definition Y at hand, we can’t show that moral reasoning really does correspond to probabilistic logical reasoning about Y. (And we’d also have to first understand what “probabilistic logical reasoning” is.) So, at this point, how can we be confident that moral reasoning does correspond to probabilistic logical reasoning about anything mathematical, and isn’t just some sort of random walk or some sort of reasoning that’s different from probabilistic logical reasoning?
Well, I think it safe to assume I mean something by moral talk, otherwise I wouldn’t care so much about whether things are right or wrong. I must be talking about something, because that something is wired into my decision system. And I presume this something is mathematical, because (assuming I mean something by “P is good”) you can take the set of all good things, and this set is the same in all counterfactuals. Roughly speaking.
It is, of course, possible that moral reasoning isn’t actually any kind of valid reasoning, but does amount to a “random walk” of some kind, where considering an argument permanently changes your intuition in some nondeterministic way so that after hearing the argument you’re not even talking about the same thing you were before hearing it. Which is worrying.
Also it’s possible that moral talk in particular is mostly signalling intended to disguise our true values which are very similar but more selfish. But that doesn’t make a lot of difference since you can still cash out your values as a mathematical object of some sort.
It is, of course, possible that moral reasoning isn’t actually any kind of valid reasoning, but does amount to a “random walk” of some kind, where considering an argument permanently changes your intuition in some nondeterministic way so that after hearing the argument you’re not even talking about the same thing you were before hearing it. Which is worrying.
Yes, exactly. This seems to me pretty likely to be the case for humans. Even if it’s actually not the case, nobody has done the work to rule it out yet (has anyone even written a post making any kind of argument that it’s not the case?), so how do we know that it’s not the case? Doesn’t it seem to you that we might be doing some motivated cognition in order to jump to a comforting conclusion?
“what X says after considering all possible moral arguments”
I know you’re not arguing for this but I can’t help noting the discrepancy between the simplicity of the phrase “all possible moral arguments”, and what it would mean if it can be defined at all.
Suppose the “simple” solution doesn’t have the problems you mention. Somehow we get our hands on a human that doesn’t have security holes and can’t go insane. I still don’t think it works.
Let’s say you are trying to do some probabilistic reasoning about the mathematical object “foobar” and the definition of it you’re given is “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’”, where X is an algorithmic description of yourself. Well, as soon as you realize that X is actually a simulation of you, you can conclude that you can say anything about ‘foobar’ and be right. So why bother doing any more probabilistic reasoning? Just say anything, or nothing. What kind of probabilistic reasoning can do you beyond that, even if you wanted to?
I think you’re collapsing some levels here, but it’s making my head hurt to think about it, having the definition-deriver and the subject be the same person.
Making this concrete: let ‘foobar’ refer to the set {1, 2, 3} in a shared language used by us and our subject, Alice. Alice would agree that it is true that “foobar = what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” where X is some algorithmic description of Alice. She would say something like “foobar = {1, 2, 3}, X would say {1, 2, 3}, {1, 2, 3} = {1, 2, 3} so this all checks out.”
Clearly then, any procedure that correctly determines what X would say about ‘foobar’ should result in the correct definition of foobar, namely {1, 2, 3}. This is what theoretically lets our “simple” solution work.
However, Alice would not agree that “what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is a correct definition of ‘foobar’. The issue is that this definition has the wrong properties when we consider counterfactuals concerning X. It is in fact the case that foobar is {1, 2, 3}, and further that ‘foobar’ means {1, 2, 3} in our current language, as stipulated at the beginning of this thought experiment. If-counterfactually X would say ‘{4, 5, 6}‘, foobar is still {1, 2, 3}, because what we mean by ‘foobar’ is {1, 2, 3} and {1, 2, 3} is {1, 2, 3} regardless of what X says.
Having written that, I now think I can return to your question. The answer is that firstly, by replacing the true definition “foobar = {1, 2, 3}” with “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” in the subject’s mind, you have just deleted the only reference to foobar that actually exists in the thought experiment. The subject has to reason about ‘foobar’ using their built in definition, since that is the only thing that actually points directly to the target object.
Secondly, as described above “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is an inaccurate definition of foobar when considering counterfactuals concerning what X would say about foobar. Which is exactly what you are doing when reasoning that “if-counterfactually I say {4, 5, 6} about foobar, then what X would say about ‘foobar’ is {4, 5, 6}, so {4, 5, 6} is correct.”
Which is to say that, analogising, the contents of our subject’s head is a pointer (in the programming sense) to the object itself, while “what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is a pointer to the first pointer. You can dereference it, and get the right answer, but you can’t just substitute it in for the first pointer. That gives you nothing but a pointer referring to itself.
ETA: Dear god, this turned into a long post. Sorry! I don’t think I can shorten it without making it worse though.
Right, so my point is that if your theory (that moral reasoning is probabilistic reasoning about some mathematical object) is to be correct, we need a definition of morality as a mathematical object which isn’t “what X says after considering all possible moral arguments”. So what could it be then? What definition Y can we give, such that it makes sense to say “when we reason about morality, we are really doing probabilistic reasoning about the mathematical object Y”?
Secondly, until we have a candidate definition Y at hand, we can’t show that moral reasoning really does correspond to probabilistic logical reasoning about Y. (And we’d also have to first understand what “probabilistic logical reasoning” is.) So, at this point, how can we be confident that moral reasoning does correspond to probabilistic logical reasoning about anything mathematical, and isn’t just some sort of random walk or some sort of reasoning that’s different from probabilistic logical reasoning?
Unfortunately I doubt I can give you a short direct definition of morality. However if such a mathematical object exists, “what X says after considering all possible moral arguments” should be enough to pin it down (disregarding the caveats to do with our subject going insane, etc).
Well, I think it safe to assume I mean something by moral talk, otherwise I wouldn’t care so much about whether things are right or wrong. I must be talking about something, because that something is wired into my decision system. And I presume this something is mathematical, because (assuming I mean something by “P is good”) you can take the set of all good things, and this set is the same in all counterfactuals. Roughly speaking.
It is, of course, possible that moral reasoning isn’t actually any kind of valid reasoning, but does amount to a “random walk” of some kind, where considering an argument permanently changes your intuition in some nondeterministic way so that after hearing the argument you’re not even talking about the same thing you were before hearing it. Which is worrying.
Also it’s possible that moral talk in particular is mostly signalling intended to disguise our true values which are very similar but more selfish. But that doesn’t make a lot of difference since you can still cash out your values as a mathematical object of some sort.
Yes, exactly. This seems to me pretty likely to be the case for humans. Even if it’s actually not the case, nobody has done the work to rule it out yet (has anyone even written a post making any kind of argument that it’s not the case?), so how do we know that it’s not the case? Doesn’t it seem to you that we might be doing some motivated cognition in order to jump to a comforting conclusion?
I know you’re not arguing for this but I can’t help noting the discrepancy between the simplicity of the phrase “all possible moral arguments”, and what it would mean if it can be defined at all.
But then many things are “easier said than done”.