Hm, I understood the traditional Less Wrong view to be something along the lines of: there is truth about the world, and that truth is independent of your values. Wanting something to be true won’t make it so. Whereas I’d expect a postmodernist to say something like: the Christians have their truth, the Buddhists have their truth, and the Atheists have theirs. Whose truth is the “real” truth comes down to the preferences of the individual. Your statement sounds more in line with the postmodernist view than the Less Wrong one.
This matters because if the Less Wrong view of the world is correct, it’s more likely that there are clean mathematical algorithms for thinking about and sharing truth that are value-neutral (or at least value-orthogonal, e.g. “aim to share facts that the student will think are maximally interesting or surprising”. Note that this doesn’t necessarily need to be implemented in a way that a “fact” which triggers an epileptic fit and causes the student to hit the “maximally interesting” button will be selected for sharing. If I have a rough model of the user’s current beliefs and preferences, I could use that to estimate the VoI of various bits of information to the user and use that as my selection criterion. Point being that our objective doesn’t need to be defined in terms of “aiming for a particular set of mental states”.)
This matters because if the Less Wrong view of the world is correct, it’s more likely that there are clean mathematical algorithms for thinking about and sharing truth that are value-neutral (or at least value-orthogonal, e.g. “aim to share facts that the student will think are maximally interesting or surprising”.
I don’t think this is correct—it misses the key map-territory distinction in the human mind. Even though there is “truth” in an objective sense, there is no necessity that the human mind can think about or share that truth. Obviously we can say that experientially we have something in our heads that correlates with reality, but that doesn’t imply that we can think about truth without implicating values. It also says nothing about whether we can discuss truth without manipulating the brain to represent things differently—and all imperfect approximations require trade-offs. If you want to train the brain to do X, you’re implicitly prioritizing some aspect of the brain’s approximation of reality over others.
Yep. There are a number of intelligent agents, each with their own subset of true beliefs. Since agents have finite resources, the they cannot learn everything, and so their subset of true beliefs must be random or guided by some set of goals or values. So truth is entangled with value in that sense, and if not in the sense of wishful thinking.
Also, there is no evidence of a any kind of One Algorithm To Rule Them All. Its in no way implied by the existence of objective reality, and everything that has been exhibited along those lines has turned out to be computationally intractable.
That they make some sensible points, but they’re wrong when they push them to far (and that they are mixing factual truths with preferences a lot). Christians do have their own “truths”, if we interpret these truths as values, which is what they generally are. “It is a sin to engage in sex before marriage” vs “(some) sex can lead to pregnancy”. If we call both of these “truths”, then we have a confusion.
Right, both of these views on truth, traditional rationality and postmodernism, result in theories of truth that don’t quite line up with what we see in the world but in different ways. The traditional rationality view fails to account for the fact that humans judge truth and we have no access to the view from nowhere, so it’s right that traditional rationality is “wrong” in the sense that it incorrectly assumes it can gain privileged access to the truth of claims to know which ones are facts and which ones are falsehoods. The postmodernist view makes an opposite and only slightly less equal mistake by correctly noticing that humans judge truth but then failing to adequately account for the ways those judgements are entangled with a shared reality. The way through is to see that both there is something shared out there that there can in theory be a fact of the matter of and also realizing that we can’t directly ascertain those facts because we must do so across the gap of (subjective) experience.
As always, I say it comes back to the problem of the criterion and our failure to adequately accept that it demands we make a leap of faith, small though we may manage to make it.
Humans have beliefs and values twisted together in all kinds of odd ways. In practice, increasing our understanding tends to go along with having a more individualist outlook, a greater power to impact the natural world, less concern about difficult-to-measure issues, and less respect for traditional practices and group identities (and often the creation of new group identities, and sometimes new traditions).
Now, I find those changes to be (generally) positive, and I’d like them to be more common. But these are value changes, and I understand why people with different values could object to them.
Your original argument, as I understood it, was something like: Explanation aims for a particular set of mental states in the student, which is also what manipulation does, so therefore explanation can’t be defined in a way that distinguishes it from manipulation. I pushed back on that. Now you’re saying that explanation tends to produce side effects in the listener’s values. Does this mean you’re allowing the possibility that explanation can be usefully defined in a way that distinguishes it from manipulation?
BTW, computer security researchers distinguish between “reject by default” (whitelisting) and “accept by default” (blacklisting). “Reject by default” is typically more secure. I’m more optimistic about trying to specify what it means to explain something (whitelisting) than what it means to manipulate someone in a way that’s improper (blacklisting). So maybe we’re shooting at different targets.
Tying all of this back to FAI… you say you find the value changes that come with greater understanding to be (generally) positive and you’d like them to be more common. I’m worried about the possibility that AGI will be a global catastrophic risk. I think there are good arguments that by default, AGI will be something which is not positive. Maybe from a triage point of view, it makes sense to focus on minimizing the probability that AGI is a global catastrophic risk, and worry about the prevention of things that we think are likely to be positive once we’re pretty sure the global catastrophic risk aspect of things has been solved?
In Eliezer’s CEV paper, he writes:
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that inter- preted.
I haven’t seen anyone on Less Wrong argue against CEV as a vision for how the future of humanity should be determined. And CEV seems to involve having the future be controlled by humans who are more knowledgable than current humans in some sense. But maybe you’re a CEV skeptic?
IMO, VoI is also not a sufficient criteria for defining manipulation… I’ll list a few problems I have with it, OTTMH:
1) It seems to reduce it to “providing misinformation, or providing information to another agent that is not maximally/sufficiently useful for them (in terms of their expected utility)”. An example (due to Mati Roy) of why this doesn’t seem to match our intuition is: what if I tell someone something true and informative that serves (only) to make them sadder? That doesn’t really seem like manipulation (although you could make a case for it).
2) I don’t like the “maximally/sufficiently” part; maybe my intuition is misleading, but manipulation seems like a qualitative thing to me. Maybe we should just constrain VoI to be positive?
3) Actually, it seems weird to talk about VoI here; VoI is prospective and subjective… it treats an agent’s beliefs as real and asks how much value they should expect to get from samples or perfect knowledge, assuming these samples or the ground truth would be distributed according to their beliefs; this makes VoI strictly non-negative. But when we’re considering whether to inform an agent of something, we might recognize that certain information we’d provide would actually be net negative (see my top level comment for an example). Not sure what to make of that ATM...
re: #2, VoI doesn’t need to be constrained to be positive. If in expectation you think the information will have a net negative impact, you shouldn’t get the information.
re: #3, of course VoI is subjective. It MUST be, because value is subjective. Spending 5 minutes to learn about the contents of a box you can buy is obviously more valuable to you than to me. Similarly, if I like chocolate more than you, finding out if a cake has chocolate is more valuable for me than for you. The information is the same, the value differs.
RE “re: #3”, my point was that it doesn’t seem like VoI is the correct way for one agent to think about informing ANOTHER agent. You could just look at the change in expected utility for the receiver after updating on some information, but I don’t like that way of defining it.
Hm, I understood the traditional Less Wrong view to be something along the lines of: there is truth about the world, and that truth is independent of your values. Wanting something to be true won’t make it so. Whereas I’d expect a postmodernist to say something like: the Christians have their truth, the Buddhists have their truth, and the Atheists have theirs. Whose truth is the “real” truth comes down to the preferences of the individual. Your statement sounds more in line with the postmodernist view than the Less Wrong one.
This matters because if the Less Wrong view of the world is correct, it’s more likely that there are clean mathematical algorithms for thinking about and sharing truth that are value-neutral (or at least value-orthogonal, e.g. “aim to share facts that the student will think are maximally interesting or surprising”. Note that this doesn’t necessarily need to be implemented in a way that a “fact” which triggers an epileptic fit and causes the student to hit the “maximally interesting” button will be selected for sharing. If I have a rough model of the user’s current beliefs and preferences, I could use that to estimate the VoI of various bits of information to the user and use that as my selection criterion. Point being that our objective doesn’t need to be defined in terms of “aiming for a particular set of mental states”.)
I don’t think this is correct—it misses the key map-territory distinction in the human mind. Even though there is “truth” in an objective sense, there is no necessity that the human mind can think about or share that truth. Obviously we can say that experientially we have something in our heads that correlates with reality, but that doesn’t imply that we can think about truth without implicating values. It also says nothing about whether we can discuss truth without manipulating the brain to represent things differently—and all imperfect approximations require trade-offs. If you want to train the brain to do X, you’re implicitly prioritizing some aspect of the brain’s approximation of reality over others.
Yep. There are a number of intelligent agents, each with their own subset of true beliefs. Since agents have finite resources, the they cannot learn everything, and so their subset of true beliefs must be random or guided by some set of goals or values. So truth is entangled with value in that sense, and if not in the sense of wishful thinking.
Also, there is no evidence of a any kind of One Algorithm To Rule Them All. Its in no way implied by the existence of objective reality, and everything that has been exhibited along those lines has turned out to be computationally intractable.
What’s your answer to the postmodernist?
That they make some sensible points, but they’re wrong when they push them to far (and that they are mixing factual truths with preferences a lot). Christians do have their own “truths”, if we interpret these truths as values, which is what they generally are. “It is a sin to engage in sex before marriage” vs “(some) sex can lead to pregnancy”. If we call both of these “truths”, then we have a confusion.
Right, both of these views on truth, traditional rationality and postmodernism, result in theories of truth that don’t quite line up with what we see in the world but in different ways. The traditional rationality view fails to account for the fact that humans judge truth and we have no access to the view from nowhere, so it’s right that traditional rationality is “wrong” in the sense that it incorrectly assumes it can gain privileged access to the truth of claims to know which ones are facts and which ones are falsehoods. The postmodernist view makes an opposite and only slightly less equal mistake by correctly noticing that humans judge truth but then failing to adequately account for the ways those judgements are entangled with a shared reality. The way through is to see that both there is something shared out there that there can in theory be a fact of the matter of and also realizing that we can’t directly ascertain those facts because we must do so across the gap of (subjective) experience.
As always, I say it comes back to the problem of the criterion and our failure to adequately accept that it demands we make a leap of faith, small though we may manage to make it.
Humans have beliefs and values twisted together in all kinds of odd ways. In practice, increasing our understanding tends to go along with having a more individualist outlook, a greater power to impact the natural world, less concern about difficult-to-measure issues, and less respect for traditional practices and group identities (and often the creation of new group identities, and sometimes new traditions).
Now, I find those changes to be (generally) positive, and I’d like them to be more common. But these are value changes, and I understand why people with different values could object to them.
Your original argument, as I understood it, was something like: Explanation aims for a particular set of mental states in the student, which is also what manipulation does, so therefore explanation can’t be defined in a way that distinguishes it from manipulation. I pushed back on that. Now you’re saying that explanation tends to produce side effects in the listener’s values. Does this mean you’re allowing the possibility that explanation can be usefully defined in a way that distinguishes it from manipulation?
BTW, computer security researchers distinguish between “reject by default” (whitelisting) and “accept by default” (blacklisting). “Reject by default” is typically more secure. I’m more optimistic about trying to specify what it means to explain something (whitelisting) than what it means to manipulate someone in a way that’s improper (blacklisting). So maybe we’re shooting at different targets.
Tying all of this back to FAI… you say you find the value changes that come with greater understanding to be (generally) positive and you’d like them to be more common. I’m worried about the possibility that AGI will be a global catastrophic risk. I think there are good arguments that by default, AGI will be something which is not positive. Maybe from a triage point of view, it makes sense to focus on minimizing the probability that AGI is a global catastrophic risk, and worry about the prevention of things that we think are likely to be positive once we’re pretty sure the global catastrophic risk aspect of things has been solved?
In Eliezer’s CEV paper, he writes:
I haven’t seen anyone on Less Wrong argue against CEV as a vision for how the future of humanity should be determined. And CEV seems to involve having the future be controlled by humans who are more knowledgable than current humans in some sense. But maybe you’re a CEV skeptic?
Well, now you’ve seen one ^_^ : https://www.lesswrong.com/posts/vgFvnr7FefZ3s3tHp/mahatma-armstrong-ceved-to-death
I’ve been going on about the problems with CEV (specifically with extrapolation) for years. This post could also be considered a CEV critique: https://www.lesswrong.com/posts/WeAt5TeS8aYc4Cpms/values-determined-by-stopping-properties
I think explanation can be defined (see https://agentfoundations.org/item?id=1249 ). I’m not confident “explanation with no manipulation” can be defined.
IMO, VoI is also not a sufficient criteria for defining manipulation… I’ll list a few problems I have with it, OTTMH:
1) It seems to reduce it to “providing misinformation, or providing information to another agent that is not maximally/sufficiently useful for them (in terms of their expected utility)”. An example (due to Mati Roy) of why this doesn’t seem to match our intuition is: what if I tell someone something true and informative that serves (only) to make them sadder? That doesn’t really seem like manipulation (although you could make a case for it).
2) I don’t like the “maximally/sufficiently” part; maybe my intuition is misleading, but manipulation seems like a qualitative thing to me. Maybe we should just constrain VoI to be positive?
3) Actually, it seems weird to talk about VoI here; VoI is prospective and subjective… it treats an agent’s beliefs as real and asks how much value they should expect to get from samples or perfect knowledge, assuming these samples or the ground truth would be distributed according to their beliefs; this makes VoI strictly non-negative. But when we’re considering whether to inform an agent of something, we might recognize that certain information we’d provide would actually be net negative (see my top level comment for an example). Not sure what to make of that ATM...
re: #2, VoI doesn’t need to be constrained to be positive. If in expectation you think the information will have a net negative impact, you shouldn’t get the information.
re: #3, of course VoI is subjective. It MUST be, because value is subjective. Spending 5 minutes to learn about the contents of a box you can buy is obviously more valuable to you than to me. Similarly, if I like chocolate more than you, finding out if a cake has chocolate is more valuable for me than for you. The information is the same, the value differs.
FWICT, both of your points are actually responses to be point (3).
RE “re: #2”, see: https://en.wikipedia.org/wiki/Value_of_information#Characteristics
RE “re: #3”, my point was that it doesn’t seem like VoI is the correct way for one agent to think about informing ANOTHER agent. You could just look at the change in expected utility for the receiver after updating on some information, but I don’t like that way of defining it.