Humans find lying difficult and unnatural due to our specific evolutionary history. Alex’s design and training wouldn’t necessarily replicate those kinds of evolutionary pressures.
My understanding is that you’re saying “Selection pressures against lying-in-particular made lying hard for humans, and if we don’t have that same type of selection pressure for AI, the AI is unlikely to find lying to be unnatural or difficult.” Given that understanding, I think there are two mistakes here:
As far as I can fathom, evolution is not directly selecting over high-level cognitive properties, like “lying is hard for this person”, in the same way that it can “directly” select over alleles which transcribe different mRNA sequences. High-level mental phenomena are probably downstream of human learning architecture, relatively nonspecific reward circuitry, regional learning hyperparameters, and so on.
Evolution’s influence on human cognition is basically screened off by that person’s genome. Evolutionary pressure is not, of course, directly “reaching in” and modifying human behavior. Human behavior is the product of the human learning process, so the relevant question is whether the within-lifetime learning phenomena which make lying hard for humans, will also make lying hard for AI.
But also, it’s doubtful that human within-lifetime learning phenomena are the only way to produce an agent for whom lying is difficult. We should consider a wider range of potential causes for lying-difficulty, and whether any of them are workable for AI. Otherwise, it’s like saying “The sidewalk won’t be wet because I won’t turn on the sprinkler”—The sprinkler is only one way to make the sidewalk wet.
I think we might agree, though, that “how does lying-difficulty happen for humans?” is the best first place to look, if we want to ask that question. It’s just that IMO “Evolution caused it” doesn’t explain anything here, I do not feel any less confused about the mechanisms after I read that purported explanation.
Hm, not sure I understand but I wasn’t trying to make super specific mechanistic claims here—I agree that what I said doesn’t reduce confusion about the specific internal mechanisms of how lying gets to be hard for most humans, but I wasn’t intending to claim that it was. I also should have said something like “evolutionary, cultural, and individual history” instead (I was using “evolution” as a shorthand to indicate it seems common among various cultures but of course that doesn’t mean don’t-lie genes are directly bred into us! Most human universals aren’t; we probably don’t have honor-the-dead and different-words-for-male-and-female genes).
I was just making the pretty basic point “AIs in general, and Alex in particular, are produced through a very different process from humans, so it seems like ‘humans find lying hard’ is pretty weak evidence that ‘AI will by default find lying hard.’”
I agree that asking “What specific neurological phenomena make it so most people find it hard to lie?” could serve as inspiration to do AI honesty research, and wasn’t intending to claim otherwise in that paragraph (though separately, I am somewhat pessimistic about this research direction).
Do you not place much weight on the “Elephant in the brain” hypothesis? Under that hypothesis, humans lie all the time. The small part of us that is our conscious personas believes the justifications it gives, which makes us think “humans are quite honest”. I mostly buy this view, so I’m not too confident that our learning algorithms provide evidence for lying being uncommon.
But I feel like the “humans have a powerful universal learning algorithm” view might not mix well with the “Elephant in the brain” hypothesis as it is commonly understood. Though I haven’t fully thought out that idea.
My understanding is that you’re saying “Selection pressures against lying-in-particular made lying hard for humans, and if we don’t have that same type of selection pressure for AI, the AI is unlikely to find lying to be unnatural or difficult.” Given that understanding, I think there are two mistakes here:
As far as I can fathom, evolution is not directly selecting over high-level cognitive properties, like “lying is hard for this person”, in the same way that it can “directly” select over alleles which transcribe different mRNA sequences. High-level mental phenomena are probably downstream of human learning architecture, relatively nonspecific reward circuitry, regional learning hyperparameters, and so on.
Evolution’s influence on human cognition is basically screened off by that person’s genome. Evolutionary pressure is not, of course, directly “reaching in” and modifying human behavior. Human behavior is the product of the human learning process, so the relevant question is whether the within-lifetime learning phenomena which make lying hard for humans, will also make lying hard for AI.
But also, it’s doubtful that human within-lifetime learning phenomena are the only way to produce an agent for whom lying is difficult. We should consider a wider range of potential causes for lying-difficulty, and whether any of them are workable for AI. Otherwise, it’s like saying “The sidewalk won’t be wet because I won’t turn on the sprinkler”—The sprinkler is only one way to make the sidewalk wet.
I think we might agree, though, that “how does lying-difficulty happen for humans?” is the best first place to look, if we want to ask that question. It’s just that IMO “Evolution caused it” doesn’t explain anything here, I do not feel any less confused about the mechanisms after I read that purported explanation.
Hm, not sure I understand but I wasn’t trying to make super specific mechanistic claims here—I agree that what I said doesn’t reduce confusion about the specific internal mechanisms of how lying gets to be hard for most humans, but I wasn’t intending to claim that it was. I also should have said something like “evolutionary, cultural, and individual history” instead (I was using “evolution” as a shorthand to indicate it seems common among various cultures but of course that doesn’t mean don’t-lie genes are directly bred into us! Most human universals aren’t; we probably don’t have honor-the-dead and different-words-for-male-and-female genes).
I was just making the pretty basic point “AIs in general, and Alex in particular, are produced through a very different process from humans, so it seems like ‘humans find lying hard’ is pretty weak evidence that ‘AI will by default find lying hard.’”
I agree that asking “What specific neurological phenomena make it so most people find it hard to lie?” could serve as inspiration to do AI honesty research, and wasn’t intending to claim otherwise in that paragraph (though separately, I am somewhat pessimistic about this research direction).
Do you not place much weight on the “Elephant in the brain” hypothesis? Under that hypothesis, humans lie all the time. The small part of us that is our conscious personas believes the justifications it gives, which makes us think “humans are quite honest”. I mostly buy this view, so I’m not too confident that our learning algorithms provide evidence for lying being uncommon.
But I feel like the “humans have a powerful universal learning algorithm” view might not mix well with the “Elephant in the brain” hypothesis as it is commonly understood. Though I haven’t fully thought out that idea.
I find lying easy and natural unless I’m not sure if I can get away with it, and I think I’m more honest than the median person!
(Not a lie)
(Honestly)