Would most of your concerns be alleviated if Tetlock just made all or most of the questions public?
More generally, it seems fine if you push for a norm of evaluating people only on public predictions, e.g. those made on Metaculus.
Not really. Here’s a cartoon example to explain why:
You don’t know math beyond basic algebra and don’t know the difference between calculus and linear algebra. Somebody gives you a linear algebra problem to solve. You hire me to get the answer.
To attract your business, I tell you the answers to 99 calculus I problems and one linear algebra problem. I get the calculus right and the linear algebra problem wrong, but you don’t know the difference. If I tell you that I “have 99% accuracy on advanced math problems posed to me,” then I’m not lying.
But does this mean I’m 99% likely to give you the right answer to your linear algebra problem? No, of course not!
Let’s say I show you the problems I solved to obtain my 99% accuracy score. Do you have the ability to tell the difference between the calc and linear algebra problems? Not without some pretty thorough investigation, and you might make mistakes or not trust yourself because you’re not an expert.
What if my results were compared to a group of mathematicians answering the same problems? They’d get 100% vs my 99%. That doesn’t look like a big difference. You still have no way to know that I’m going to get your linear algebra problems wrong every time unless you ask me a lot of them.
Maybe I lose your business eventually, but I can still keep looking for clients and broadcasting my general accuracy on self-selected questions. If you ask 3 questions and I get them all wrong, I lose your business and you stop asking those hard questions. The business I keep continues asking gimmes, which I keep getting right. If new hard-question-askers keep trusting my overall accuracy without being able to discern between the hard and easy questions, I’ll keep misleading them.
Now in this cartoon, any mathematician could come along and demolish my forecasting firm by pointing out that I was mixing two different types of questions and systematically failing the linear algebra.
There’s no easy way to do that with real world forecasting.
The only way to deal with this problem is if an adversarial question-asker selects the questions posed to the forecasters.
So I am positing that in the absence of a formal system for doing that, you must be the adversarial question asker by only counting as evidence of accuracy the questions you genuinely feel uncertain about.
Would most of your concerns be alleviated if Tetlock just made all or most of the questions public? More generally, it seems fine if you push for a norm of evaluating people only on public predictions, e.g. those made on Metaculus.
Not really. Here’s a cartoon example to explain why:
You don’t know math beyond basic algebra and don’t know the difference between calculus and linear algebra. Somebody gives you a linear algebra problem to solve. You hire me to get the answer.
To attract your business, I tell you the answers to 99 calculus I problems and one linear algebra problem. I get the calculus right and the linear algebra problem wrong, but you don’t know the difference. If I tell you that I “have 99% accuracy on advanced math problems posed to me,” then I’m not lying.
But does this mean I’m 99% likely to give you the right answer to your linear algebra problem? No, of course not!
Let’s say I show you the problems I solved to obtain my 99% accuracy score. Do you have the ability to tell the difference between the calc and linear algebra problems? Not without some pretty thorough investigation, and you might make mistakes or not trust yourself because you’re not an expert.
What if my results were compared to a group of mathematicians answering the same problems? They’d get 100% vs my 99%. That doesn’t look like a big difference. You still have no way to know that I’m going to get your linear algebra problems wrong every time unless you ask me a lot of them.
Maybe I lose your business eventually, but I can still keep looking for clients and broadcasting my general accuracy on self-selected questions. If you ask 3 questions and I get them all wrong, I lose your business and you stop asking those hard questions. The business I keep continues asking gimmes, which I keep getting right. If new hard-question-askers keep trusting my overall accuracy without being able to discern between the hard and easy questions, I’ll keep misleading them.
Now in this cartoon, any mathematician could come along and demolish my forecasting firm by pointing out that I was mixing two different types of questions and systematically failing the linear algebra.
There’s no easy way to do that with real world forecasting.
The only way to deal with this problem is if an adversarial question-asker selects the questions posed to the forecasters.
So I am positing that in the absence of a formal system for doing that, you must be the adversarial question asker by only counting as evidence of accuracy the questions you genuinely feel uncertain about.