lessdazed comments on Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28

lessdazed 30 Mar 2012 19:19 UTC
6 points
The main problem is that a test tests ability to take the test, independently of what its makers intended. The more similar tests are to each other, the more taking the first is training for the second, and the easier it is to teach directly to the test rather than to the skill that inspired the test. The less similar the before and after tests are, the less comparable they are.

Rationality training is particularly tricky because one is to learn formal models of both straight and twisted thinking, recognize when real-life situations resemble those patterns, and then decide how much formal treatment to give the situation, as well as how much weight to give to one’s formal model as against one’s feelings, reflexive thoughts, and so on.

Traditional classroom tests are set up to best test the first bit, knowledge of the formal models, if one did solve the problems inherent in testing. Even to the extent one can ask people about how one ought to react in the field, e.g. when to use which sort of calculation, that is still a question with a correct answer according to a formal model and one is still not testing the ability to apply it!

These problems resemble those the military has faced in its training and testing. They use indoctrination, simulations, and field tests. Decision making is tested under uncomfortable conditions, ensuring probable good decision making under most circumstances. In general, knowing what they do is likely to be helpful.

The problems with tests are not intractable. One can limit the gain on the second test from having taken the first test by saturating the test taker with knowledge of the test before it is taken the first time, though few would be motivated. One can try to make a test similar to the skill tested, so ability at the test is well correlated with the skill one intends to test. One can try to devise very different sorts of tests that measure the same thing (I doubt that will work here).

One component of a useful classroom test might resemble the classic research on correspondence bias. In it, people judge individuals’ support for positions based off an essay they supposedly wrote. Some subjects are told that the writer chose the thesis, others that the writer had it assigned. (The theses were either pro- or anti-Castro.) People inferred that the essay’s author significantly agreed with the thesis even when they were told it was assigned to them. The quality of an essay a person produces is some evidence of what they believe, as is their willingness to write it at all, etc., but in general people overly infer others’ dispositions from actions they take under social constraint, even when they know of the constraint.

Here is how the framework could translate into a useful rationality test: the test would give people some evidence for something they are biased to overly believe, and the quantity and quality of legitimate evidence in the test would vary widely. One would not be able to pass the test by simply detecting the bias and then declare oneself unmoved in that wrong direction, as one might be able to do for, say, sunk costs. Instead, the valid evidence and invalid inclination would be along the same vector such that one would have to distinguish the bias from the rest of the evidence in the environment.

This solves the problem of having a classroom test be an easy exercise of spotting the biased thought pattern and quashing it. Videos or essays of various people with known beliefs arguing for or against those beliefs could be used to train and test people in this. It’s actually probably a skill one could learn without any idea of how one was doing it.

Expressed abstractly, the idea is to test for ability to quantify wrong thinking by mixing it with legitimate evidence, all of which increases confidence in a particular conclusion. This is hard to game because the hard part isn’t recognizing the bias. The material’s being media from real life prevents testers from imposing an unrealistic model that ignores actual evidence (e.g., a strongly pro-Castro person really might refuse to write an anti-Castro essay).