Given we are not perfect Bayesians and cannot necessarily expect to agree enough via Aumann to classify our belief as “agreeing”, is there much reason to think optimizing on agreement even makes much sense? That is, these well thought out lines of reasoning for how optimizing on agreement can fail aside, it seems we should anyway not expect agreement to be a strong measure of reasoning ability.
Empirically I very commonly see people around me (and myself) trying to agree, which is one of the reasons I wrote this post.
And that’s not obviously wrong. I’m pro sharing models and info more than trying to nail down perfect agreement on a variable, but it seems fairly sensible to me to use disagreement as a sign of a place where model-sharing has high returns on investment. The reason this post is necessary is not because optimising for agreement is terrible, but because it’s useful. Yet I think that many achieve the failure modes of causal goodharting.
Given we are not perfect Bayesians and cannot necessarily expect to agree enough via Aumann to classify our belief as “agreeing”, is there much reason to think optimizing on agreement even makes much sense? That is, these well thought out lines of reasoning for how optimizing on agreement can fail aside, it seems we should anyway not expect agreement to be a strong measure of reasoning ability.
Empirically I very commonly see people around me (and myself) trying to agree, which is one of the reasons I wrote this post.
And that’s not obviously wrong. I’m pro sharing models and info more than trying to nail down perfect agreement on a variable, but it seems fairly sensible to me to use disagreement as a sign of a place where model-sharing has high returns on investment. The reason this post is necessary is not because optimising for agreement is terrible, but because it’s useful. Yet I think that many achieve the failure modes of causal goodharting.