RFC on an open problem: how to determine probabilities in the face of social distortion

So, I have a problem that I want help with, but I don’t want to focus on the object-level thing except as a particularly sharp example of the abstract problem. Please only throw suggestions here if you have a reasonable understanding of Bayes’ rule, *and* grok the problem I’m pointing at.

First, the problem I’m ACTUALLY trying to solve: If I believe I solved a practical problem to the satisfaction of its stakeholders, what is the probability that my belief is correct?

Now, the problem *around* the problem I’m actually trying to solve:

I can look at a solution I implemented to a practical problem, and judge whether I’m satisfied with it or not. BUT, often there are more actual stakeholders than just me. These stakeholders ALSO have opinions about whether I implemented the solution to their satisfaction.

Note I said “have opinions about”, *not* “have information about”. This is an important distinction, as you’re about to see.

So. Let’s say I have a confidence probability of p=0.65 that I performed a particular task well.

There are five other identified stakeholders, and their averaged confidence is something like p=0.4 that I performed that task well.

I have some weighting process that says whose opinion I take more seriously, so I adjust my confidence down to p=0.55.

I notice that this adjusts their averaged confidence downward, to something like p=0.35.

I perform more tasks, and poll more. I notice that people’s opinion of my performance tracks mine, but is invariably adjusted downward by about 33%. So I adjust my opinion of my performance downward by about 30%.

Magically, this causes their opinion of my performance to drop by about 25%.

After taking enough samples, I start deliberately adjusting my opinion of my performance *upwards* massively. If I feel a 0.7 probability that I satisfied the requirement, I convince myself that it’s actually an 0.9 probability. Consistently, other people start treating my performance as if it were higher quality—but they ALSO treat me as if I’m an arrogant ass who can’t judge the quality of my performance. Because they’re still setting their own opinions to something like 33% below mine, but when they translate the probability into something like ‘confidence weight’ (which I assume is some kind of log-odds transform), being 33% less confident than me FEELS like a much bigger gap when I’m 90% sure and they’re 60% sure, than when I’m 65% sure and they’re 45% sure.

I’m convinced that there’s SOME signal in the noise of their confidence in my ability, but so much of it seems coupled to my OWN confidence in my ability, that I can’t tell.

So, how do I calibrate?

I have three points of frustration here, that I wish to caution responders about:

1. A meta-level problem I have with these situations, is that I often notice that people are spending more time trying to convince me that they’re using something like Bayes’ rule, than they are actually generating observable evidence that they’re using something like Bayes’ rule. So assume that at least one of us is miscalibrated about being miscalibrated, *AND* that we’re collectively miscalibrated about being miscalibrated about being miscalibrated.

2. Many people, when I talk about this, tell me “oh, well *I* don’t take your confidence and adjust downward, so please exclude me from the list of people who do.” Most of the people who tell me this, are in fact doing so. This means that even if you AREN’T doing so, saying those words is not evidence that you aren’t doing so, and in fact is weak evidence that you are. If you decide to emit those words anyways, I will assume you didn’t actually grok the Sequences or the correct bits of HPMOR, and I will discount your advice accordingly.

3. Most people, when they try to describe the process they think they use to arrive at a confidence level that I solved a problem, craft a narrative story about why some evidence is relevant and other evidence isn’t. These narratives suspiciously change from situation to situation, such that different bits of evidence are relevant in one case and not in another, in a way that *looks* highly motivated to arrive at numbers that *appear* to actually just be tracking my own confidence level. Most people react with offense or frustration rather than curiosity when I ask them what they’re using to determine which evidence is relevant and which isn’t. Don’t be that guy.