“you maximize Bayes Score iff you use all your knowledge as well as possible. ”
yes, but in a test where you have no knowledge (e.g. Eliezer is a great rationalist but knows nothing about pokemon) this is unhelpful… This test would work well on ranking rationalists iff you had a set of general knowledge questions that you were confident everyone had roughly the same amount of knowledge about.
The test would also work statistically to measure the effect of an intervention, if you had more subjects than variance. A test with too much variance can’t be organizational, but it can be experimental.
If you are asked about pokemon, AI design, 13th century chinese history, martian geology, german literature, Yankees batting averages, lyrics to popular songs from the 1820s, etc. you would be forced to get maximal mileage out of whatever knowledge you can bring to bear on each question, which would in most cases be slim to none.
If the questions are chosen randomly and eclectically enough, there should be no way to game the system, and scores should average out for people knowledgeable in different areas.
If you dependably know more than I do across a broad spectrum of subject areas, then I would assume that you have learned more than I have during your life so far, which seems to me to be symptomatic of good rationality.
“across a broad spectrum of subject areas … questions are chosen randomly”
but this is the real weasel in there. Defining a good prior on “subject areas” is problematic. A very rational nerd would get wiped out if there are too many trivia questions… which is what happened to me just now on Tom’s rationality test:
Though my calibration on this test was very good, my Bayes Score was rubbish. Most of the questions were about America, (cultural bias) and most were about people (subject area bias). I like my idea of (calibration) * (Bayes Score).
“you maximize Bayes Score iff you use all your knowledge as well as possible. ”
yes, but in a test where you have no knowledge (e.g. Eliezer is a great rationalist but knows nothing about pokemon) this is unhelpful… This test would work well on ranking rationalists iff you had a set of general knowledge questions that you were confident everyone had roughly the same amount of knowledge about.
The test would also work statistically to measure the effect of an intervention, if you had more subjects than variance. A test with too much variance can’t be organizational, but it can be experimental.
If you are asked about pokemon, AI design, 13th century chinese history, martian geology, german literature, Yankees batting averages, lyrics to popular songs from the 1820s, etc. you would be forced to get maximal mileage out of whatever knowledge you can bring to bear on each question, which would in most cases be slim to none.
If the questions are chosen randomly and eclectically enough, there should be no way to game the system, and scores should average out for people knowledgeable in different areas.
If you dependably know more than I do across a broad spectrum of subject areas, then I would assume that you have learned more than I have during your life so far, which seems to me to be symptomatic of good rationality.
“across a broad spectrum of subject areas … questions are chosen randomly”
but this is the real weasel in there. Defining a good prior on “subject areas” is problematic. A very rational nerd would get wiped out if there are too many trivia questions… which is what happened to me just now on Tom’s rationality test:
http://www.acceleratingfuture.com/tom/calibrate.php
Though my calibration on this test was very good, my Bayes Score was rubbish. Most of the questions were about America, (cultural bias) and most were about people (subject area bias). I like my idea of (calibration) * (Bayes Score).