Sometime ago you believed, correctly IMO, that you need a way of testing rationality skills first, and only then get busy on the exercises. What made you change your mind? (I hope it wasn’t something like “we need to push ahead asap”.) What’s the current plan for preventing the slide into epistemic viciousness? (I hope it isn’t something like “we will be smart and won’t let it happen”.)
We are interested in developing rationality measures; if you have ideas for how to do this, please post; if you’re interested in doing larger chunks of work toward developing such measures, please fill in the application form or email me. Blake Riley and I and some other rationality campers worked on this some over the summer, and slow work continues on the same front. Aaron Tucker and I made an experimental daily checklist that we’ve been playing with, for estimating one’s own habits and progress. I’d love to see this work go faster. (I just added a checkbox about this to the application form; thanks for pointing that out; there was a similar item on the call for volunteers that I posted locally some weeks ago, but I forgot about it when posting this round).
It seems to me that rationality measures are valuable, but that creating exercises does not make our present lack of robust measures worse than it already is. Take a look at the linked unit on the sunk costs fallacy, above; when I tested it on newbies (and on LWers), they seemed interested, and started noticing sunk cost fallacy examples in their lives, and did not seem to be much flummoxed by questions of who was how rational or how one could really tell. The sequences already teach some thinking skill without measures (much as the dance class I took in a few years ago helped my dancing some without ever measuring my skill). Measures would be helpful; but refraining from creating exercises until after we have measures does not seem helpful to me.
creating exercises does not make our present lack of robust measures worse than it already is (...) they seemed interested, and started noticing sunk cost fallacy examples in their lives
Martial arts masters and psychotherapy gurus could say the same. Instead of sunk costs you could teach newbies to notice post-colonial alienation or intelligent design, and sure enough they’d get better at noticing that thing in their lives. I hear scientologists do lots of exercises too. Maybe creating exercises before measures is a positive expected value decision, but I wouldn’t bet on that.
“Sunk cost” is a pretty well-defined idea, we can reliably figure out whether something is a sunk cost, and whether a decision commits sunk cost fallacy, by checking whether the decision controls the amount of lost value and whether the (immutable) amount of lost value controls the decision. Skill at noticing sunk cost fallacy would then be ability to parse such situations quickly/automatically.
Testing effectiveness of training a skill is easier than testing usefulness of the skill, and I think figuring out how to train people to avoid a list of fallacies or to find correct decisions of standard kinds faster and more reliably is a reasonable goal, even if practical usefulness of having those skills remains uncertain.
If Anna and I can’t think of a simple way, you seem to have a rather exaggerated idea of what the fulltime hire needs to be able to do. I don’t understand why people are reading this ad and thinking, “Hm, they want Superperson!” But it clearly needs to be rewritten.
I would be very, very surprised if you and Anna literally came up with nothing of value on measuring rationality; I expect there’s some raw material for a full-time employee to test, tweak and build on. This just seems to me like a higher priority than curriculum-building, and achieving a measure that’s better than subjective impressions doesn’t even seem impossible to me.
Here’s how typical people read typical job ads (typically), especially ones that are this long: Read the title. Scan for a dollar sign or the words “salary” or “salary range”. If both are good enough, scan for the first bulleted list of qualifications. Most ads call these “required qualifications”. If the reader meets enough of these, they scan for the second bulleted list of qualifications which is usually called “preferred qualifications”. Then, if they meet enough of both of these, they’ll go back and start reading in detail to understand the position better before they consider sending in an application or contacting the hiring entity for more information.
I suspect that most people expected your job ad to follow this form since it almost does. Your sections are labeled, effectively “needed” and “bonus”. It’s not until you get to reading the now-bolded details that you find out that not all of the “needed” stuff is required of the applicant and that essentially any one of the needed qualifications will be sufficient. Basically, you don’t have any required qualifications, but you do have a general description of the sort of person you’re interested in and a list of preferred qualifications. In this regard, the ad is defective as it fails to comport with the usual format of a typical ad.
Non-standard forms get experienced people’s hackles up. It often indicates that there’s something unprofessional about the organization.
It’s a project that has people such as you and lukeprog involved in it. (Luke wasn’t mentioned, but he was running the rationality camps etc., so people are going to associate him with this regardless of whether his name is actually mentioned.) You two can, with good reason, be considered Superpeople. I expect that many people will automatically assume that for a cause as important as this, you will only accept folks who are themselves Superpeople as well.
To the extent that irrationality is a result of compartmentalization, this may be the same thing as creating a way to measure how effectively you are accomplishing your goals, which is going to vary between people depending on what their goals are.
For most interesting goals I can think of, creating a rigorous quantitative measure is next to impossible. However, there are a few goals, like running a mile in under four minutes, that lend themselves well to this approach. Perhaps SI could find a group of individuals engaged in such a goal and offer their services as rationality consultants?
Sometime ago you believed, correctly IMO, that you need a way of testing rationality skills first, and only then get busy on the exercises.
This is something that I was expecting them to do—or at least attempt—in the rationality bootcamp they ran last year. Yet they seemed to have lost all interest in testing by the time the camp came around. It seemed like a waste of potential.
“I’m recruiting to put together a rationality test. It’s based on how you score on this series of individual questions. I am posting the “Sunk Costs” questions (see these linked PDF files), and we would like to hire people to develop this test further for other things which seem to be components of rationality.”
This would appear to meet your objection of “Sometime ago you believed, correctly IMO, that you need a way of testing rationality skills first, and only then get busy on the exercises.” because in the way I am casting the argument, they are working on a test.
However, functionally, this seems very similar to what they are doing right now.
That being said, I don’t get an intuitive feeling that I’m refuting your central point, so either I need to improve my counter argument, I’m wrong about my refutation, or I need to update my intuition.
After trying to identify possible flaws in my argument, It occurs to me that a “Test” would not have learning material such as the powerpoint. It would also have a grading metric. But it would be hard to develop a grading metric without the full list of topics which are being planned for the .pdf files, (You can’t develop a full rationality grading metric off of only sunk cost questions.) and I feel like you would need to develop questions and answer sets like those that are in the .pdf files whether you were making exercises or tests.
If I’m correct, another way of expressing your point might be “Less Powerpoints with M and M rewards and repeating mantras. Those strike me as cultish. More questions like in the .PDF files. You could use those to build a test, and I agree with your earlier point that testing is critical.”
If I’m incorrect, can you help me understand where I went wrong?
Sure, all exercises can also be viewed as tests, but they make for pretty narrow tests and risk being irrelevant to the big picture. I’d like a more comprehensive test that would use many subskills at once. For example, when learning a foreign language, a simple exercise may look like “conjugate this verb”, and a comprehensive test may look like “translate this text” or “carry on a freeform conversation”. When learning a martial art, a simple exercise may look like “punch the bag exactly as I show you”, and a comprehensive test may look like “stay on your feet for two rounds against this guy”.
It seems that comprehensive tests are often toy versions of real-life problems. They guide the development of simple exercises and let you tell good exercises from bad ones. If someone cannot imagine a comprehensive test for their skillset, I don’t see how they convince themselves that their simple exercises are relevant to anything.
Testing rationality is something of an ill posed problem, in part because the result depends greatly on context. People spout all kinds of nonsense in a social context where it’s just words, but usually manage to compartmentalize the nonsense in a material context where they will be affected by the results of their actions. (This is a feature! Given that evolution wasn’t able to come up with minds that infallibly distinguish true beliefs from false ones, it’s good that at least it came up with a way to reduce the harm from false beliefs.) I’m not sure how to create an accurate test in the face of that.
Your martial arts analogy isn’t a bad one. The outcome of a karate contest is often not the same as the outcome of a street fight between the same participants. There are any number of cases of a black belt karateka with ten years training getting into a fight with a scrawny untrained criminal, and getting his ass kicked in three seconds flat. Martial arts practitioners have had this testing problem for centuries and still don’t seem close to solving it, which doesn’t make for optimism about our prospects of solving the rationality testing problem this century. Given that, proceeding as best we can in the absence of a comprehensive and accurate test seems reasonable.
People spout all kinds of nonsense in a social context where it’s just words, but usually manage to compartmentalize the nonsense in a material context where they will be affected by the results of their actions.
But doesn’t it seem that if you decompartmentalized with correct beliefs you should do way better? Possibly in a testable way?
Martial arts practitioners have had this testing problem for centuries and still don’t seem close to solving it, which doesn’t make for optimism about our prospects of solving the rationality testing problem this century.
See MMA. There is still a problem of whether being a good fighter is as important or related to being good at self-defense, but martial arts are now measured at least relative to all fighting styles.
But doesn’t it seem that if you decompartmentalized with correct beliefs you should do way better?
Maybe; there are all sorts of caveats to that. But that aside, more directly on the question of tests:
Possibly in a testable way?
You still run into the problem that the outcome depends greatly on context and phrasing. There is the question with turning over cards to test a hypothesis, on which people’s performance dramatically improves when you rephrase it as an isomorphic question about social rules. There are the trolley questions and the specks versus torture question and the ninety-seven percent versus one hundred percent question, on which the right answer depends entirely on whether you treat it as a mathematical question that happens to be expressed in English syntax or a question about what you should do if you believed yourself to really be in that situation. There are questions about uncertain loss isomorphic to questions about uncertain gain where people nonetheless give different answers, which is irrational if considered as a material problem, but rational in the more likely and actual situation where the only thing at stake is social status, which sometimes does depend on how the question was phrased. Etc.
That’s why I called the testing problem ill posed; it’s not just that it’s hard to figure out the solution, it’s hard to see what would be the criteria of a good solution in the first place.
Those examples are good evidence for us not being able to test coherently yet, but I don’t think they are good evidence that the question is ill-posed.
If the question is “how can we test rationality?”, and the only answers we’ve come up with are limited in scope and subject to all kinds of misinterpretation, I don’t think that means we can’t come up with broad tests that measure progress. I am reminded of a quote: “what you are saying amounts to ‘if it is possible, it ought to be easy’”
I think the place to find good tests will be instead of looking at how well people do against particular biases, look at what it is we think rationality is good for, and measure something related to that.
Ill posed does not necessarily mean impossible. Most of the problems we deal with in real life are ill posed, but we still usually manage to come up with solutions that are good enough for the particular contexts at hand. What it does mean is that we shouldn’t expect the problem in question to be definitely solved once and for all. I’m not arguing against attempting to test rationality. I’m arguing against the position some posters have taken that there’s no point even trying to make progress on rationality until the problem of testing it has been definitely solved.
The author of the original epistemic viciousness essay seems to think that culture (in other words, “being smart and not letting it happen”, or not) is actually pretty important:
Just last week I was on the way home from a judo class with a friend—
a senior judoka and university student—who insisted that although there was
nothing wrong with lifting weights, strength was unimportant in judo, and it
wouldn’t help one to become a better judo player. To this the appropriate
reply is of course, unprintable.
...
Judo is an art in which there is relatively little room for pretence; in randori,
either you manage to throw your opponent, or you don’t. In newaza, either you
escape from your opponent’s hold or you don’t.
...
Why are there so many fantasists in the martial arts, as compared to other
activities? And there are; you won’t find many sprinters or removal-men who
would tell you that strength doesn’t matter to their chosen tasks, nor will you
find power-lifters who think they can move the bar without touching it or engineers who specialise in ki-distribution.
I believe the judoka being quoted may have misheard, misremembered, or is misapplying a different point that is sometimes taught and that is not insane. I have elsewhere heard the advice that bulking up too early in one’s judo studies is counterproductive, because you have more margin for error in techniques if you can make up for doing them not-quite-correctly by being very strong, so really buff people may fail to notice and correct flaws in their form. Then they get whupped by people who actually mastered the techiques.
Of course, once you’ve reached yudansha, and already have a good grasp of form, then you’re supposed to bulk up to be able to beat other yudansha.
It’s not that important to what I was saying, though: the essay is mostly about how martial artists in particular have terrible epistemic hygiene. The idea of lack of measurement is only mentioned in passing, along with the remark that theoretical physics manages to be respectable despite it and that the real problem is not that martial arts lacks measurement, but that martial artists are much more sure of themselves than their paucity of data justifies.
Defining key performance indicators for things like these is not very hard, neither is developing ways to measure the performance. Tweaking the accuracy and fixing the gamable parts once the basics are done is the harder part. Also these metrics should like any theory be in a continual beta state and get tweaked, just make clear that the trend compared to previous measurements is broken. I can spend a little time on irc teaching someone how to do this but my time is extremely limited right now so it will have to be a formalish appointment with an eager student.
Sometime ago you believed, correctly IMO, that you need a way of testing rationality skills first, and only then get busy on the exercises. What made you change your mind? (I hope it wasn’t something like “we need to push ahead asap”.) What’s the current plan for preventing the slide into epistemic viciousness? (I hope it isn’t something like “we will be smart and won’t let it happen”.)
We are interested in developing rationality measures; if you have ideas for how to do this, please post; if you’re interested in doing larger chunks of work toward developing such measures, please fill in the application form or email me. Blake Riley and I and some other rationality campers worked on this some over the summer, and slow work continues on the same front. Aaron Tucker and I made an experimental daily checklist that we’ve been playing with, for estimating one’s own habits and progress. I’d love to see this work go faster. (I just added a checkbox about this to the application form; thanks for pointing that out; there was a similar item on the call for volunteers that I posted locally some weeks ago, but I forgot about it when posting this round).
It seems to me that rationality measures are valuable, but that creating exercises does not make our present lack of robust measures worse than it already is. Take a look at the linked unit on the sunk costs fallacy, above; when I tested it on newbies (and on LWers), they seemed interested, and started noticing sunk cost fallacy examples in their lives, and did not seem to be much flummoxed by questions of who was how rational or how one could really tell. The sequences already teach some thinking skill without measures (much as the dance class I took in a few years ago helped my dancing some without ever measuring my skill). Measures would be helpful; but refraining from creating exercises until after we have measures does not seem helpful to me.
Martial arts masters and psychotherapy gurus could say the same. Instead of sunk costs you could teach newbies to notice post-colonial alienation or intelligent design, and sure enough they’d get better at noticing that thing in their lives. I hear scientologists do lots of exercises too. Maybe creating exercises before measures is a positive expected value decision, but I wouldn’t bet on that.
“Sunk cost” is a pretty well-defined idea, we can reliably figure out whether something is a sunk cost, and whether a decision commits sunk cost fallacy, by checking whether the decision controls the amount of lost value and whether the (immutable) amount of lost value controls the decision. Skill at noticing sunk cost fallacy would then be ability to parse such situations quickly/automatically.
Testing effectiveness of training a skill is easier than testing usefulness of the skill, and I think figuring out how to train people to avoid a list of fallacies or to find correct decisions of standard kinds faster and more reliably is a reasonable goal, even if practical usefulness of having those skills remains uncertain.
How do you think we should proceed?
The first task of your full-time hire should be coming up with rationality-measuring tools that are better than human intuition.
If Anna and I can’t think of a simple way, you seem to have a rather exaggerated idea of what the fulltime hire needs to be able to do. I don’t understand why people are reading this ad and thinking, “Hm, they want Superperson!” But it clearly needs to be rewritten.
I would be very, very surprised if you and Anna literally came up with nothing of value on measuring rationality; I expect there’s some raw material for a full-time employee to test, tweak and build on. This just seems to me like a higher priority than curriculum-building, and achieving a measure that’s better than subjective impressions doesn’t even seem impossible to me.
Here’s how typical people read typical job ads (typically), especially ones that are this long: Read the title. Scan for a dollar sign or the words “salary” or “salary range”. If both are good enough, scan for the first bulleted list of qualifications. Most ads call these “required qualifications”. If the reader meets enough of these, they scan for the second bulleted list of qualifications which is usually called “preferred qualifications”. Then, if they meet enough of both of these, they’ll go back and start reading in detail to understand the position better before they consider sending in an application or contacting the hiring entity for more information.
I suspect that most people expected your job ad to follow this form since it almost does. Your sections are labeled, effectively “needed” and “bonus”. It’s not until you get to reading the now-bolded details that you find out that not all of the “needed” stuff is required of the applicant and that essentially any one of the needed qualifications will be sufficient. Basically, you don’t have any required qualifications, but you do have a general description of the sort of person you’re interested in and a list of preferred qualifications. In this regard, the ad is defective as it fails to comport with the usual format of a typical ad.
Non-standard forms get experienced people’s hackles up. It often indicates that there’s something unprofessional about the organization.
It’s a project that has people such as you and lukeprog involved in it. (Luke wasn’t mentioned, but he was running the rationality camps etc., so people are going to associate him with this regardless of whether his name is actually mentioned.) You two can, with good reason, be considered Superpeople. I expect that many people will automatically assume that for a cause as important as this, you will only accept folks who are themselves Superpeople as well.
Don’t proceed. Stay at the drawing board until you figure out a viable attack. Stay there until you die, if you have to.
This seems like a rather extreme position to me. I’d be curious to hear you explain your thinking.
There isn’t much to explain. I just think that taking steps towards cultishness has lower expected utility than doing nothing.
To the extent that irrationality is a result of compartmentalization, this may be the same thing as creating a way to measure how effectively you are accomplishing your goals, which is going to vary between people depending on what their goals are.
For most interesting goals I can think of, creating a rigorous quantitative measure is next to impossible. However, there are a few goals, like running a mile in under four minutes, that lend themselves well to this approach. Perhaps SI could find a group of individuals engaged in such a goal and offer their services as rationality consultants?
This is something that I was expecting them to do—or at least attempt—in the rationality bootcamp they ran last year. Yet they seemed to have lost all interest in testing by the time the camp came around. It seemed like a waste of potential.
Assume Eliezer instead said
“I’m recruiting to put together a rationality test. It’s based on how you score on this series of individual questions. I am posting the “Sunk Costs” questions (see these linked PDF files), and we would like to hire people to develop this test further for other things which seem to be components of rationality.”
This would appear to meet your objection of “Sometime ago you believed, correctly IMO, that you need a way of testing rationality skills first, and only then get busy on the exercises.” because in the way I am casting the argument, they are working on a test.
However, functionally, this seems very similar to what they are doing right now.
That being said, I don’t get an intuitive feeling that I’m refuting your central point, so either I need to improve my counter argument, I’m wrong about my refutation, or I need to update my intuition.
After trying to identify possible flaws in my argument, It occurs to me that a “Test” would not have learning material such as the powerpoint. It would also have a grading metric. But it would be hard to develop a grading metric without the full list of topics which are being planned for the .pdf files, (You can’t develop a full rationality grading metric off of only sunk cost questions.) and I feel like you would need to develop questions and answer sets like those that are in the .pdf files whether you were making exercises or tests.
If I’m correct, another way of expressing your point might be “Less Powerpoints with M and M rewards and repeating mantras. Those strike me as cultish. More questions like in the .PDF files. You could use those to build a test, and I agree with your earlier point that testing is critical.”
If I’m incorrect, can you help me understand where I went wrong?
Sure, all exercises can also be viewed as tests, but they make for pretty narrow tests and risk being irrelevant to the big picture. I’d like a more comprehensive test that would use many subskills at once. For example, when learning a foreign language, a simple exercise may look like “conjugate this verb”, and a comprehensive test may look like “translate this text” or “carry on a freeform conversation”. When learning a martial art, a simple exercise may look like “punch the bag exactly as I show you”, and a comprehensive test may look like “stay on your feet for two rounds against this guy”.
It seems that comprehensive tests are often toy versions of real-life problems. They guide the development of simple exercises and let you tell good exercises from bad ones. If someone cannot imagine a comprehensive test for their skillset, I don’t see how they convince themselves that their simple exercises are relevant to anything.
Testing rationality is something of an ill posed problem, in part because the result depends greatly on context. People spout all kinds of nonsense in a social context where it’s just words, but usually manage to compartmentalize the nonsense in a material context where they will be affected by the results of their actions. (This is a feature! Given that evolution wasn’t able to come up with minds that infallibly distinguish true beliefs from false ones, it’s good that at least it came up with a way to reduce the harm from false beliefs.) I’m not sure how to create an accurate test in the face of that.
Your martial arts analogy isn’t a bad one. The outcome of a karate contest is often not the same as the outcome of a street fight between the same participants. There are any number of cases of a black belt karateka with ten years training getting into a fight with a scrawny untrained criminal, and getting his ass kicked in three seconds flat. Martial arts practitioners have had this testing problem for centuries and still don’t seem close to solving it, which doesn’t make for optimism about our prospects of solving the rationality testing problem this century. Given that, proceeding as best we can in the absence of a comprehensive and accurate test seems reasonable.
But doesn’t it seem that if you decompartmentalized with correct beliefs you should do way better? Possibly in a testable way?
See MMA. There is still a problem of whether being a good fighter is as important or related to being good at self-defense, but martial arts are now measured at least relative to all fighting styles.
Maybe; there are all sorts of caveats to that. But that aside, more directly on the question of tests:
You still run into the problem that the outcome depends greatly on context and phrasing. There is the question with turning over cards to test a hypothesis, on which people’s performance dramatically improves when you rephrase it as an isomorphic question about social rules. There are the trolley questions and the specks versus torture question and the ninety-seven percent versus one hundred percent question, on which the right answer depends entirely on whether you treat it as a mathematical question that happens to be expressed in English syntax or a question about what you should do if you believed yourself to really be in that situation. There are questions about uncertain loss isomorphic to questions about uncertain gain where people nonetheless give different answers, which is irrational if considered as a material problem, but rational in the more likely and actual situation where the only thing at stake is social status, which sometimes does depend on how the question was phrased. Etc.
That’s why I called the testing problem ill posed; it’s not just that it’s hard to figure out the solution, it’s hard to see what would be the criteria of a good solution in the first place.
Those examples are good evidence for us not being able to test coherently yet, but I don’t think they are good evidence that the question is ill-posed.
If the question is “how can we test rationality?”, and the only answers we’ve come up with are limited in scope and subject to all kinds of misinterpretation, I don’t think that means we can’t come up with broad tests that measure progress. I am reminded of a quote: “what you are saying amounts to ‘if it is possible, it ought to be easy’”
I think the place to find good tests will be instead of looking at how well people do against particular biases, look at what it is we think rationality is good for, and measure something related to that.
Ill posed does not necessarily mean impossible. Most of the problems we deal with in real life are ill posed, but we still usually manage to come up with solutions that are good enough for the particular contexts at hand. What it does mean is that we shouldn’t expect the problem in question to be definitely solved once and for all. I’m not arguing against attempting to test rationality. I’m arguing against the position some posters have taken that there’s no point even trying to make progress on rationality until the problem of testing it has been definitely solved.
Ok, that’s reasonable. I was taking ill-posed to mean like a confused question. Or something like that.
The author of the original epistemic viciousness essay seems to think that culture (in other words, “being smart and not letting it happen”, or not) is actually pretty important:
http://www.artsci.wustl.edu/~grussell/epistemicviciousness.pdf
I believe the judoka being quoted may have misheard, misremembered, or is misapplying a different point that is sometimes taught and that is not insane. I have elsewhere heard the advice that bulking up too early in one’s judo studies is counterproductive, because you have more margin for error in techniques if you can make up for doing them not-quite-correctly by being very strong, so really buff people may fail to notice and correct flaws in their form. Then they get whupped by people who actually mastered the techiques.
Of course, once you’ve reached yudansha, and already have a good grasp of form, then you’re supposed to bulk up to be able to beat other yudansha.
Could be true.
It’s not that important to what I was saying, though: the essay is mostly about how martial artists in particular have terrible epistemic hygiene. The idea of lack of measurement is only mentioned in passing, along with the remark that theoretical physics manages to be respectable despite it and that the real problem is not that martial arts lacks measurement, but that martial artists are much more sure of themselves than their paucity of data justifies.
Defining key performance indicators for things like these is not very hard, neither is developing ways to measure the performance. Tweaking the accuracy and fixing the gamable parts once the basics are done is the harder part. Also these metrics should like any theory be in a continual beta state and get tweaked, just make clear that the trend compared to previous measurements is broken. I can spend a little time on irc teaching someone how to do this but my time is extremely limited right now so it will have to be a formalish appointment with an eager student.