Subtle point: I believe the claim you’re drawing from was that it’s highly likely that the inputs to human values (i.e. the “things humans care about”) are natural abstractions.
To check that I understand the distinction between those two: inputs to human values are features of the environment around which our values are based. For example, the concept of liberty might be an important input to human values because the freedom to exercise your own will is a natural thing we would expect humans to want, whereas humans can differ greatly in things like (1) metaethics about why liberty matters, (2) the extent to which liberty should be traded off with other values, if indeed it can be traded off at all. People might disagree about interpretations of these concepts (especially different cultures), but in a world where these weren’t natural abstractions, we might expect disagreement in the first place to be extremely hard because the discussants aren’t even operating on the same wavelength, i.e. they don’t really have a set of shared concepts to structure their disagreements around.
One theme I notice throughout the “evidence” section is that it’s mostly starting from arguments that the NAH might not be true, then counterarguments, and sometimes counter-counterarguments.
Yeah, that’s a good point. I think partly that’s because my thinking about the NAH basically starts with “the inside view seems to support it, in the sense that the abstractions that I use seem natural to me”, and so from there I start thinking about whether this is a situation in which the inside view should be trusted, which leads to considering the validity of arguments against it (i.e. “am I just anthropomorphising?”).
However, to give a few specific reasons I think it seems plausible that don’t just rely on the inside view:
Humans were partly selected for their ability to act in the world to improve their situations. Since abstractions are all about finding good high-level models that describe things you might care about and how they interact with the rest of the world, it seems like there should have been a competitive pressure for humans to find good abstractions. This argument doesn’t feel very contingent on the specifics human cognition or what our simplicity priors are; rather the abstractions should be a function of the environment (hence convergence to the same abstractions by other cognitive systems which are also under competition, e.g. in the form of computational efficiency requirements, seems intuitive)
There’s lots of empirical evidence that seems to support it, at least at a weak level (e.g. CLIP as discussed in my post, or GPT-3 as mentioned by Rohin in his summary for the newsletter)
Returning to the clarification you made about inputs to human values being the natural abstraction rather than the actual values, it seems like the fact that different cultures can have a shared basis for disagreement might support some form of the NAH rather than arguing against it? I guess that point has a few caveats though, e.g. (1) all cultures have been shaped significantly by global factors like European imperialism, and (2) humans are all very close together in mind design space so we’d expect something like this anyway, natural abstraction or not
Thanks for the comment!
To check that I understand the distinction between those two: inputs to human values are features of the environment around which our values are based. For example, the concept of liberty might be an important input to human values because the freedom to exercise your own will is a natural thing we would expect humans to want, whereas humans can differ greatly in things like (1) metaethics about why liberty matters, (2) the extent to which liberty should be traded off with other values, if indeed it can be traded off at all. People might disagree about interpretations of these concepts (especially different cultures), but in a world where these weren’t natural abstractions, we might expect disagreement in the first place to be extremely hard because the discussants aren’t even operating on the same wavelength, i.e. they don’t really have a set of shared concepts to structure their disagreements around.
Yeah, that’s a good point. I think partly that’s because my thinking about the NAH basically starts with “the inside view seems to support it, in the sense that the abstractions that I use seem natural to me”, and so from there I start thinking about whether this is a situation in which the inside view should be trusted, which leads to considering the validity of arguments against it (i.e. “am I just anthropomorphising?”).
However, to give a few specific reasons I think it seems plausible that don’t just rely on the inside view:
Humans were partly selected for their ability to act in the world to improve their situations. Since abstractions are all about finding good high-level models that describe things you might care about and how they interact with the rest of the world, it seems like there should have been a competitive pressure for humans to find good abstractions. This argument doesn’t feel very contingent on the specifics human cognition or what our simplicity priors are; rather the abstractions should be a function of the environment (hence convergence to the same abstractions by other cognitive systems which are also under competition, e.g. in the form of computational efficiency requirements, seems intuitive)
There’s lots of empirical evidence that seems to support it, at least at a weak level (e.g. CLIP as discussed in my post, or GPT-3 as mentioned by Rohin in his summary for the newsletter)
Returning to the clarification you made about inputs to human values being the natural abstraction rather than the actual values, it seems like the fact that different cultures can have a shared basis for disagreement might support some form of the NAH rather than arguing against it? I guess that point has a few caveats though, e.g. (1) all cultures have been shaped significantly by global factors like European imperialism, and (2) humans are all very close together in mind design space so we’d expect something like this anyway, natural abstraction or not