The practice of moral philosophy doesn’t much resemble the practice of mathematics. Mainly because in moral philosophy we don’t know exactly what we’re talking about when we talk about morality. In mathematics, particularly since the 20th century, we can eventually precisely specify what we mean by a mathematical object, in terms of sets.
“Morality is logic” means that when we talk about morality we are talking about a mathematical object. The fact that the only place in our mind the reference to this object is stored is our intuition is what makes moral philosophy so difficult and non-logicy. In practice you can’t write down a complete syntactic description of morality, so in general¹ neither can you write syntactic proofs of theorems about morality. This is not to say that such descriptions or proofs do not exist!
In practice moral philosophy proceeds by a kind of probabilistic reasoning, which might be analogized to the thinking that leads one to conjecture that P≠NP, except with even less rigor. I’d expect that things like the order of moral arguments mattering come down to framing effects and other biases which are always involved regardless of the subject, but don’t show up in mathematics so much because proofs leave little wiggle room.
¹ Of course, you may be able to write proofs that only use simple properties that you can be fairly sure hold of morality without knowing its full description, but such properties are usually either quite boring or not widely agreed upon or don’t lead to interesting proofs due to being too specific. eg. “It’s wrong to kill someone without their permission when there’s nothing to be gained by it.”
“Morality is logic” means that when we talk about morality we are talking about a mathematical object.
How does one go about defining this mathematical object, in principle? Suppose you were a superintelligence and could surmount any kind of technical difficulty, and you wanted to define a human’s morality precisely as a mathematical object, how would you do it?
In principle, you start with a human brain, and extract from it somehow a description of what it means when it says “morality”. Presumably involving some kind of analysis of what would make the human say “that’s good!” or “that’s bad!”, and/or of what computational processes inside the brain are involved in deciding whether to say “good” or “bad”. The output is, in theory, a function mapping things to how much they match “good” or “bad” in your human’s language.
The ‘simple’ solution, of just simulating what your human would say after being exposed to every possible moral argument, runs into trouble with what exactly constitutes an argument—if an UFAI can hack your brain into doing terrible things just by talking to you, clearly not all verbal engagement can be allowed—and also more mundane issues like our simulated human going insane from all this talking.
Suppose the “simple” solution doesn’t have the problems you mention. Somehow we get our hands on a human that doesn’t have security holes and can’t go insane. I still don’t think it works.
Let’s say you are trying to do some probabilistic reasoning about the mathematical object “foobar” and the definition of it you’re given is “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’”, where X is an algorithmic description of yourself. Well, as soon as you realize that X is actually a simulation of you, you can conclude that you can say anything about ‘foobar’ and be right. So why bother doing any more probabilistic reasoning? Just say anything, or nothing. What kind of probabilistic reasoning can do you beyond that, even if you wanted to?
I think you’re collapsing some levels here, but it’s making my head hurt to think about it, having the definition-deriver and the subject be the same person.
Making this concrete: let ‘foobar’ refer to the set {1, 2, 3} in a shared language used by us and our subject, Alice. Alice would agree that it is true that “foobar = what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” where X is some algorithmic description of Alice. She would say something like “foobar = {1, 2, 3}, X would say {1, 2, 3}, {1, 2, 3} = {1, 2, 3} so this all checks out.”
Clearly then, any procedure that correctly determines what X would say about ‘foobar’ should result in the correct definition of foobar, namely {1, 2, 3}. This is what theoretically lets our “simple” solution work.
However, Alice would not agree that “what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is a correct definition of ‘foobar’. The issue is that this definition has the wrong properties when we consider counterfactuals concerning X. It is in fact the case that foobar is {1, 2, 3}, and further that ‘foobar’ means {1, 2, 3} in our current language, as stipulated at the beginning of this thought experiment. If-counterfactually X would say ‘{4, 5, 6}‘, foobar is still {1, 2, 3}, because what we mean by ‘foobar’ is {1, 2, 3} and {1, 2, 3} is {1, 2, 3} regardless of what X says.
Having written that, I now think I can return to your question. The answer is that firstly, by replacing the true definition “foobar = {1, 2, 3}” with “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” in the subject’s mind, you have just deleted the only reference to foobar that actually exists in the thought experiment. The subject has to reason about ‘foobar’ using their built in definition, since that is the only thing that actually points directly to the target object.
Secondly, as described above “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is an inaccurate definition of foobar when considering counterfactuals concerning what X would say about foobar. Which is exactly what you are doing when reasoning that “if-counterfactually I say {4, 5, 6} about foobar, then what X would say about ‘foobar’ is {4, 5, 6}, so {4, 5, 6} is correct.”
Which is to say that, analogising, the contents of our subject’s head is a pointer (in the programming sense) to the object itself, while “what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is a pointer to the first pointer. You can dereference it, and get the right answer, but you can’t just substitute it in for the first pointer. That gives you nothing but a pointer referring to itself.
ETA: Dear god, this turned into a long post. Sorry! I don’t think I can shorten it without making it worse though.
Right, so my point is that if your theory (that moral reasoning is probabilistic reasoning about some mathematical object) is to be correct, we need a definition of morality as a mathematical object which isn’t “what X says after considering all possible moral arguments”. So what could it be then? What definition Y can we give, such that it makes sense to say “when we reason about morality, we are really doing probabilistic reasoning about the mathematical object Y”?
Secondly, until we have a candidate definition Y at hand, we can’t show that moral reasoning really does correspond to probabilistic logical reasoning about Y. (And we’d also have to first understand what “probabilistic logical reasoning” is.) So, at this point, how can we be confident that moral reasoning does correspond to probabilistic logical reasoning about anything mathematical, and isn’t just some sort of random walk or some sort of reasoning that’s different from probabilistic logical reasoning?
Right, so my point is that if your theory (that moral reasoning is probabilistic reasoning about some mathematical object) is to be correct, we need a definition of morality as a mathematical object which isn’t “what X says after considering all possible moral arguments”. So what could it be then? What definition Y can we give, such that it makes sense to say “when we reason about morality, we are really doing probabilistic reasoning about the mathematical object Y”?
Unfortunately I doubt I can give you a short direct definition of morality. However if such a mathematical object exists, “what X says after considering all possible moral arguments” should be enough to pin it down (disregarding the caveats to do with our subject going insane, etc).
Secondly, until we have a candidate definition Y at hand, we can’t show that moral reasoning really does correspond to probabilistic logical reasoning about Y. (And we’d also have to first understand what “probabilistic logical reasoning” is.) So, at this point, how can we be confident that moral reasoning does correspond to probabilistic logical reasoning about anything mathematical, and isn’t just some sort of random walk or some sort of reasoning that’s different from probabilistic logical reasoning?
Well, I think it safe to assume I mean something by moral talk, otherwise I wouldn’t care so much about whether things are right or wrong. I must be talking about something, because that something is wired into my decision system. And I presume this something is mathematical, because (assuming I mean something by “P is good”) you can take the set of all good things, and this set is the same in all counterfactuals. Roughly speaking.
It is, of course, possible that moral reasoning isn’t actually any kind of valid reasoning, but does amount to a “random walk” of some kind, where considering an argument permanently changes your intuition in some nondeterministic way so that after hearing the argument you’re not even talking about the same thing you were before hearing it. Which is worrying.
Also it’s possible that moral talk in particular is mostly signalling intended to disguise our true values which are very similar but more selfish. But that doesn’t make a lot of difference since you can still cash out your values as a mathematical object of some sort.
It is, of course, possible that moral reasoning isn’t actually any kind of valid reasoning, but does amount to a “random walk” of some kind, where considering an argument permanently changes your intuition in some nondeterministic way so that after hearing the argument you’re not even talking about the same thing you were before hearing it. Which is worrying.
Yes, exactly. This seems to me pretty likely to be the case for humans. Even if it’s actually not the case, nobody has done the work to rule it out yet (has anyone even written a post making any kind of argument that it’s not the case?), so how do we know that it’s not the case? Doesn’t it seem to you that we might be doing some motivated cognition in order to jump to a comforting conclusion?
“what X says after considering all possible moral arguments”
I know you’re not arguing for this but I can’t help noting the discrepancy between the simplicity of the phrase “all possible moral arguments”, and what it would mean if it can be defined at all.
The practice of moral philosophy doesn’t much resemble the practice of mathematics. Mainly because in moral philosophy we don’t know exactly what we’re talking about when we talk about morality. In mathematics, particularly since the 20th century, we can eventually precisely specify what we mean by a mathematical object, in terms of sets.
“Morality is logic” means that when we talk about morality we are talking about a mathematical object. The fact that the only place in our mind the reference to this object is stored is our intuition is what makes moral philosophy so difficult and non-logicy. In practice you can’t write down a complete syntactic description of morality, so in general¹ neither can you write syntactic proofs of theorems about morality. This is not to say that such descriptions or proofs do not exist!
In practice moral philosophy proceeds by a kind of probabilistic reasoning, which might be analogized to the thinking that leads one to conjecture that P≠NP, except with even less rigor. I’d expect that things like the order of moral arguments mattering come down to framing effects and other biases which are always involved regardless of the subject, but don’t show up in mathematics so much because proofs leave little wiggle room.
¹ Of course, you may be able to write proofs that only use simple properties that you can be fairly sure hold of morality without knowing its full description, but such properties are usually either quite boring or not widely agreed upon or don’t lead to interesting proofs due to being too specific. eg. “It’s wrong to kill someone without their permission when there’s nothing to be gained by it.”
How does one go about defining this mathematical object, in principle? Suppose you were a superintelligence and could surmount any kind of technical difficulty, and you wanted to define a human’s morality precisely as a mathematical object, how would you do it?
I don’t really know the answer to that question.
In principle, you start with a human brain, and extract from it somehow a description of what it means when it says “morality”. Presumably involving some kind of analysis of what would make the human say “that’s good!” or “that’s bad!”, and/or of what computational processes inside the brain are involved in deciding whether to say “good” or “bad”. The output is, in theory, a function mapping things to how much they match “good” or “bad” in your human’s language.
The ‘simple’ solution, of just simulating what your human would say after being exposed to every possible moral argument, runs into trouble with what exactly constitutes an argument—if an UFAI can hack your brain into doing terrible things just by talking to you, clearly not all verbal engagement can be allowed—and also more mundane issues like our simulated human going insane from all this talking.
Suppose the “simple” solution doesn’t have the problems you mention. Somehow we get our hands on a human that doesn’t have security holes and can’t go insane. I still don’t think it works.
Let’s say you are trying to do some probabilistic reasoning about the mathematical object “foobar” and the definition of it you’re given is “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’”, where X is an algorithmic description of yourself. Well, as soon as you realize that X is actually a simulation of you, you can conclude that you can say anything about ‘foobar’ and be right. So why bother doing any more probabilistic reasoning? Just say anything, or nothing. What kind of probabilistic reasoning can do you beyond that, even if you wanted to?
I think you’re collapsing some levels here, but it’s making my head hurt to think about it, having the definition-deriver and the subject be the same person.
Making this concrete: let ‘foobar’ refer to the set {1, 2, 3} in a shared language used by us and our subject, Alice. Alice would agree that it is true that “foobar = what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” where X is some algorithmic description of Alice. She would say something like “foobar = {1, 2, 3}, X would say {1, 2, 3}, {1, 2, 3} = {1, 2, 3} so this all checks out.”
Clearly then, any procedure that correctly determines what X would say about ‘foobar’ should result in the correct definition of foobar, namely {1, 2, 3}. This is what theoretically lets our “simple” solution work.
However, Alice would not agree that “what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is a correct definition of ‘foobar’. The issue is that this definition has the wrong properties when we consider counterfactuals concerning X. It is in fact the case that foobar is {1, 2, 3}, and further that ‘foobar’ means {1, 2, 3} in our current language, as stipulated at the beginning of this thought experiment. If-counterfactually X would say ‘{4, 5, 6}‘, foobar is still {1, 2, 3}, because what we mean by ‘foobar’ is {1, 2, 3} and {1, 2, 3} is {1, 2, 3} regardless of what X says.
Having written that, I now think I can return to your question. The answer is that firstly, by replacing the true definition “foobar = {1, 2, 3}” with “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” in the subject’s mind, you have just deleted the only reference to foobar that actually exists in the thought experiment. The subject has to reason about ‘foobar’ using their built in definition, since that is the only thing that actually points directly to the target object.
Secondly, as described above “foobar is what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is an inaccurate definition of foobar when considering counterfactuals concerning what X would say about foobar. Which is exactly what you are doing when reasoning that “if-counterfactually I say {4, 5, 6} about foobar, then what X would say about ‘foobar’ is {4, 5, 6}, so {4, 5, 6} is correct.”
Which is to say that, analogising, the contents of our subject’s head is a pointer (in the programming sense) to the object itself, while “what X would say about ‘foobar’ after being exposed to every possible argument concerning ‘foobar’” is a pointer to the first pointer. You can dereference it, and get the right answer, but you can’t just substitute it in for the first pointer. That gives you nothing but a pointer referring to itself.
ETA: Dear god, this turned into a long post. Sorry! I don’t think I can shorten it without making it worse though.
Right, so my point is that if your theory (that moral reasoning is probabilistic reasoning about some mathematical object) is to be correct, we need a definition of morality as a mathematical object which isn’t “what X says after considering all possible moral arguments”. So what could it be then? What definition Y can we give, such that it makes sense to say “when we reason about morality, we are really doing probabilistic reasoning about the mathematical object Y”?
Secondly, until we have a candidate definition Y at hand, we can’t show that moral reasoning really does correspond to probabilistic logical reasoning about Y. (And we’d also have to first understand what “probabilistic logical reasoning” is.) So, at this point, how can we be confident that moral reasoning does correspond to probabilistic logical reasoning about anything mathematical, and isn’t just some sort of random walk or some sort of reasoning that’s different from probabilistic logical reasoning?
Unfortunately I doubt I can give you a short direct definition of morality. However if such a mathematical object exists, “what X says after considering all possible moral arguments” should be enough to pin it down (disregarding the caveats to do with our subject going insane, etc).
Well, I think it safe to assume I mean something by moral talk, otherwise I wouldn’t care so much about whether things are right or wrong. I must be talking about something, because that something is wired into my decision system. And I presume this something is mathematical, because (assuming I mean something by “P is good”) you can take the set of all good things, and this set is the same in all counterfactuals. Roughly speaking.
It is, of course, possible that moral reasoning isn’t actually any kind of valid reasoning, but does amount to a “random walk” of some kind, where considering an argument permanently changes your intuition in some nondeterministic way so that after hearing the argument you’re not even talking about the same thing you were before hearing it. Which is worrying.
Also it’s possible that moral talk in particular is mostly signalling intended to disguise our true values which are very similar but more selfish. But that doesn’t make a lot of difference since you can still cash out your values as a mathematical object of some sort.
Yes, exactly. This seems to me pretty likely to be the case for humans. Even if it’s actually not the case, nobody has done the work to rule it out yet (has anyone even written a post making any kind of argument that it’s not the case?), so how do we know that it’s not the case? Doesn’t it seem to you that we might be doing some motivated cognition in order to jump to a comforting conclusion?
I know you’re not arguing for this but I can’t help noting the discrepancy between the simplicity of the phrase “all possible moral arguments”, and what it would mean if it can be defined at all.
But then many things are “easier said than done”.