It seems like you wanted me to respond to this comment, so I’ll write a quick reply.
Now for the rub: I think anyone working on AI alignment (or any technical question of comparable difficulty) mustn’t exhibit this attitude with respect to [the thing they’re working on]. If you have a problem where you’re not able to achieve high confidence in your own models of something (relative to competing ambient models), you’re not going to be able to follow your own thoughts far enough to do good work—not without being interrupted by thoughts like “But if I multiply the probability of this assumption being true, by the probability of that assumption being true, by the probability of that assumption being true...” and “But [insert smart person here] thinks this assumption is unlikely to be true, so what probability should I assign to it really?”
This doesn’t seem true for me. I think through details of exotic hypotheticals all the time.
Maybe others are different. But it seems like maybe you’re proposing that people self-deceive in order to get themselves confident enough to explore the ramifications of a particular hypothesis. I think we should be a bit skeptical of intentional self-deception. And if self-deception is really necessary, let’s make it a temporary suspension of belief sort of thing, as opposed to a life belief that leads you to not talk to those with other views.
It’s been a while since I read Inadequate Equilibria. But I remember the message of the book being fairly nuanced. For example, it seems pretty likely to me that there’s no specific passage which contradicts the statement “hedgehogs make better predictions on average than foxes”.
I support people trying to figure things out for themselves, and I apologize if I unintentionally discouraged anyone from doing that—it wasn’t my intention. I also think people consider learning from disagreement to be virtuous for a good reason, not just due to “epistemic learned helplessness”. Also, learning from disagreement seems importantly different from generic deference—especially if you took the time to learn about their views and found yourself unpersuaded. Basically, I think people should account for both known unknowns (in the form of people who disagree whose views you don’t understand) and unknown unknowns, but it seems OK to not defer to the masses / defer to authorities if you have a solid grasp of how they came to their conclusion (this is my attempt to restate the thesis of Inadequate Equilibria as I remember it).
I don’t deny that learning from disagreement has costs. Probably some people do it too much. I encouraged MIRI to do it more on the margin, but it could be that my guess about their current margin is incorrect, who knows.
But it seems like maybe you’re proposing that people self-deceive in order to get themselves confident enough to explore the ramifications of a particular hypothesis. I think we should be a bit skeptical of intentional self-deception.
I want to clarify that this is not my proposal, and to the extent that it had been someone’s proposal, I would be approximately as wary about it as you are. I think self-deception is quite bad on average, and even on occasions when it’s good, that fact isn’t predictable in advance, making choosing to self-deceive pretty much always a negative expected-value action.
The reason I suspect you interpreted this as my proposal is that you’re speaking from a frame where “confidence in one’s model” basically doesn’t happen by default, so to get there people need to self-deceive, i.e. there’s no way for someone [in a sufficiently “hard” domain] to have a model and be confident in that model without doing [something like] artificially inflating their confidence higher than it actually is.
I think this is basically false. I claim that having (real, not artificial) confidence in a given model (even of something “hard”) is entirely possible, and moreover happens naturally, as part of the process of constructing a gears-level model to begin with. If your gears-level model actually captures some relevant fraction of the problem domain, I claim it will be obviously the case that it does so—and therefore a researcher holding that model would be very much justified in placing high confidence in [that part of] their model.
How much should such a researcher be swayed by the mere knowledge that other researchers disagree? I claim the ideal answer is “not at all”, for the same reason that argument screens off authority. And I agree that, from the perspective of somebody on the outside (who only has access to the information that two similarly-credentialed researchers disagree, without access to the gears in question), this can look basically like self-deception. But (I claim) from the inside the difference is very obvious, and not at all reminiscent of self-deception.
(Some fields do not admit good gears-level models at all, and therefore it’s basically impossible to achieve the epistemic state described above. For people in such fields, they might plausibly imagine that all fields have this property. But this isn’t the case—and in fact, I would argue that the division of the sciences into “harder” and “softer” is actually pointing at precisely this distinction: the “hardness” attributed to a field is in fact a measure of how possible it is to form a strong gears-level model.)
Does this mean “learning from disagreement” is useless? Not necessarily; gears-level models can also be wrong and/or incomplete, and one entirely plausible (and sometimes quite useful) mechanism by which to patch up incomplete models is to exchange gears with someone else, who may not be working with quite the same toolbox as you. But (I claim) for this process to actually help, it should done in a targeted way: ideally, you’re going into the conversation already with some idea of what you hope to get out of it, having picked your partner beforehand for their likeliness to have gears you personally are missing. If you’re merely “seeking out disagreement” for the purpose of fulfilling a quota, that (I claim) is unlikely to lead anywhere productive. (And I view your exhortations for MIRI to “seek out more disagreement on the margin” as proposing essentially just such a quota.)
(Standard disclaimer: I am not affiliated with MIRI, and my views do not necessarily reflect their views, etc.)
It seems like you wanted me to respond to this comment, so I’ll write a quick reply.
This doesn’t seem true for me. I think through details of exotic hypotheticals all the time.
Maybe others are different. But it seems like maybe you’re proposing that people self-deceive in order to get themselves confident enough to explore the ramifications of a particular hypothesis. I think we should be a bit skeptical of intentional self-deception. And if self-deception is really necessary, let’s make it a temporary suspension of belief sort of thing, as opposed to a life belief that leads you to not talk to those with other views.
It’s been a while since I read Inadequate Equilibria. But I remember the message of the book being fairly nuanced. For example, it seems pretty likely to me that there’s no specific passage which contradicts the statement “hedgehogs make better predictions on average than foxes”.
I support people trying to figure things out for themselves, and I apologize if I unintentionally discouraged anyone from doing that—it wasn’t my intention. I also think people consider learning from disagreement to be virtuous for a good reason, not just due to “epistemic learned helplessness”. Also, learning from disagreement seems importantly different from generic deference—especially if you took the time to learn about their views and found yourself unpersuaded. Basically, I think people should account for both known unknowns (in the form of people who disagree whose views you don’t understand) and unknown unknowns, but it seems OK to not defer to the masses / defer to authorities if you have a solid grasp of how they came to their conclusion (this is my attempt to restate the thesis of Inadequate Equilibria as I remember it).
I don’t deny that learning from disagreement has costs. Probably some people do it too much. I encouraged MIRI to do it more on the margin, but it could be that my guess about their current margin is incorrect, who knows.
Thanks for the reply.
I want to clarify that this is not my proposal, and to the extent that it had been someone’s proposal, I would be approximately as wary about it as you are. I think self-deception is quite bad on average, and even on occasions when it’s good, that fact isn’t predictable in advance, making choosing to self-deceive pretty much always a negative expected-value action.
The reason I suspect you interpreted this as my proposal is that you’re speaking from a frame where “confidence in one’s model” basically doesn’t happen by default, so to get there people need to self-deceive, i.e. there’s no way for someone [in a sufficiently “hard” domain] to have a model and be confident in that model without doing [something like] artificially inflating their confidence higher than it actually is.
I think this is basically false. I claim that having (real, not artificial) confidence in a given model (even of something “hard”) is entirely possible, and moreover happens naturally, as part of the process of constructing a gears-level model to begin with. If your gears-level model actually captures some relevant fraction of the problem domain, I claim it will be obviously the case that it does so—and therefore a researcher holding that model would be very much justified in placing high confidence in [that part of] their model.
How much should such a researcher be swayed by the mere knowledge that other researchers disagree? I claim the ideal answer is “not at all”, for the same reason that argument screens off authority. And I agree that, from the perspective of somebody on the outside (who only has access to the information that two similarly-credentialed researchers disagree, without access to the gears in question), this can look basically like self-deception. But (I claim) from the inside the difference is very obvious, and not at all reminiscent of self-deception.
(Some fields do not admit good gears-level models at all, and therefore it’s basically impossible to achieve the epistemic state described above. For people in such fields, they might plausibly imagine that all fields have this property. But this isn’t the case—and in fact, I would argue that the division of the sciences into “harder” and “softer” is actually pointing at precisely this distinction: the “hardness” attributed to a field is in fact a measure of how possible it is to form a strong gears-level model.)
Does this mean “learning from disagreement” is useless? Not necessarily; gears-level models can also be wrong and/or incomplete, and one entirely plausible (and sometimes quite useful) mechanism by which to patch up incomplete models is to exchange gears with someone else, who may not be working with quite the same toolbox as you. But (I claim) for this process to actually help, it should done in a targeted way: ideally, you’re going into the conversation already with some idea of what you hope to get out of it, having picked your partner beforehand for their likeliness to have gears you personally are missing. If you’re merely “seeking out disagreement” for the purpose of fulfilling a quota, that (I claim) is unlikely to lead anywhere productive. (And I view your exhortations for MIRI to “seek out more disagreement on the margin” as proposing essentially just such a quota.)
(Standard disclaimer: I am not affiliated with MIRI, and my views do not necessarily reflect their views, etc.)