I think I’d feel differently about John’s list if it contained things that weren’t goodhartable, such as… I don’t know, most things are goodhartable. For example, citation density does probably have an impact (not just a correlation) on credence score. But giving truth or credibility points for citations is extremely gameable. A score based on citation density is worthless as soon as it becomes popular because people will do what they would have anyway and throw some citations in on top. Popular authors may not even have to do that themselves. The difference between what John suggested and a prediction market with a citation-count bot is that if that gaming starts to happen, the citation count bot will begin failing (which is an extremely useful signal, so I’d be happy to have citation count bot participating).
Put another way: in a soon-to-air podcast, an author described how reading epistemic spot checks gave them a little shoulder-Elizabeth when writing their own book, pushing them to be more accurate and more justified. That’s a fantastic outcome that I’m really proud of, although I’ll hold the real congratulations for after I read the book. I don’t think a book would be made better by giving the author a shoulder-citation bot, or even a shoulder-complex multivariable function. I suspect some of that is because epistemic spot checks are not a score, they’re a process, and demonstrating a process people can apply themselves, rather than a score they can optimize, leads to better epistemics.
A follow up question is “would shoulder-prediction markets be as useful?”. I think they could be, but that would depend on the prediction market being evaluated by something like the research I do, not a function like John suggests. The prediction markets involve multiple people doing and sometimes sharing research; Ozzie has talked about them as a tool for collaborative learning as opposed to competition (I’ve pinged him and he can say more on that if he likes).
Additionally, John’s suggested metrics are mostly correlated with traditional success in academia, and if I thought traditional academic success was a good predictor of truth I wouldn’t be doing all this work. That’s a testable hypothesis and tests of it might look something like what John suggests, but I would view it as “testing academia”, not “discovering useful metrics”.
This question has spurred some really interesting and useful thoughts for me, thank you for asking it.
On them being for “collaborative learning”; the specific thing I was thinking was how good prediction systems should really encourage introspectability and knowledge externalities in order to be maximally cost-effective. I wrote a bit about this here
I think I’d feel differently about John’s list if it contained things that weren’t goodhartable, such as… I don’t know, most things are goodhartable. For example, citation density does probably have an impact (not just a correlation) on credence score. But giving truth or credibility points for citations is extremely gameable. A score based on citation density is worthless as soon as it becomes popular because people will do what they would have anyway and throw some citations in on top. Popular authors may not even have to do that themselves. The difference between what John suggested and a prediction market with a citation-count bot is that if that gaming starts to happen, the citation count bot will begin failing (which is an extremely useful signal, so I’d be happy to have citation count bot participating).
Put another way: in a soon-to-air podcast, an author described how reading epistemic spot checks gave them a little shoulder-Elizabeth when writing their own book, pushing them to be more accurate and more justified. That’s a fantastic outcome that I’m really proud of, although I’ll hold the real congratulations for after I read the book. I don’t think a book would be made better by giving the author a shoulder-citation bot, or even a shoulder-complex multivariable function. I suspect some of that is because epistemic spot checks are not a score, they’re a process, and demonstrating a process people can apply themselves, rather than a score they can optimize, leads to better epistemics.
A follow up question is “would shoulder-prediction markets be as useful?”. I think they could be, but that would depend on the prediction market being evaluated by something like the research I do, not a function like John suggests. The prediction markets involve multiple people doing and sometimes sharing research; Ozzie has talked about them as a tool for collaborative learning as opposed to competition (I’ve pinged him and he can say more on that if he likes).
Additionally, John’s suggested metrics are mostly correlated with traditional success in academia, and if I thought traditional academic success was a good predictor of truth I wouldn’t be doing all this work. That’s a testable hypothesis and tests of it might look something like what John suggests, but I would view it as “testing academia”, not “discovering useful metrics”.
This question has spurred some really interesting and useful thoughts for me, thank you for asking it.
On them being for “collaborative learning”; the specific thing I was thinking was how good prediction systems should really encourage introspectability and knowledge externalities in order to be maximally cost-effective. I wrote a bit about this here