Here’s a sketch of my thinking about the usefulness of metrics like the Big 5 for what CFAR is trying to do.
It would be convenient if there was a definitive measure of a person’s rationality which closely matched what we mean by the term and was highly sensitive to changes. But as far as I can tell there isn’t one, and there isn’t likely to be one anytime soon. So we rely on a mix of indicators, including some that are more like systematic metrics, some that are more like individuals’ subjective impressions, and some that are in between.
I think of the established psychology metrics (Big 5, life satisfaction, general self-efficacy, etc.) as primarily providing a sanity check on whether the workshop is doing something, along with a very very rough picture of some of what it is doing. They are quantitative measures that don’t rely on staff members’ subjective impressions of participants, they have been validated (at least to some extent) in existing psychology research, and they seem at least loosely related to the effects that CFAR hopes to have. And, compared to other ways of evaluating CFAR’s impact on individuals, they’re relatively easy for an outsider to make sense of.
A major limitation of these established psychology metrics is that they haven’t been that helpful as feedback loops. One of the main purposes of a metric is to provide input into CFAR’s day-to-day and workshop-to-workshop efforts to develop better techniques and refine the workshop. That is hard to do with metrics like the ones in the longitudinal study, because of a combination of a few factors:
The results aren’t available until several months after the workshop, which would make for very slow feedback loops and iteration.
The results are too noisy to tell if changes from one workshop to the next are just random variation. It takes several workshops worth of data to get a clear signal on most of the metrics.
These metrics are only loosely related to what we care about. If a change to the workshop leads to larger increases in conscientiousness that does not necessarily mean that we want to do it, and when a curriculum developer is working on a class they are generally not that interested in these particular metrics.
These metrics are relatively general/coarse indicators of the effect of the workshop as a whole, not tied to particular inputs. So (for example) if we make some changes to the TAPs class and want to see if the new version of the class works better or worse, there isn’t a metric that isolates the effects of the TAPs class from the rest of the workshop.
Here’s a sketch of my thinking about the usefulness of metrics like the Big 5 for what CFAR is trying to do.
It would be convenient if there was a definitive measure of a person’s rationality which closely matched what we mean by the term and was highly sensitive to changes. But as far as I can tell there isn’t one, and there isn’t likely to be one anytime soon. So we rely on a mix of indicators, including some that are more like systematic metrics, some that are more like individuals’ subjective impressions, and some that are in between.
I think of the established psychology metrics (Big 5, life satisfaction, general self-efficacy, etc.) as primarily providing a sanity check on whether the workshop is doing something, along with a very very rough picture of some of what it is doing. They are quantitative measures that don’t rely on staff members’ subjective impressions of participants, they have been validated (at least to some extent) in existing psychology research, and they seem at least loosely related to the effects that CFAR hopes to have. And, compared to other ways of evaluating CFAR’s impact on individuals, they’re relatively easy for an outsider to make sense of.
A major limitation of these established psychology metrics is that they haven’t been that helpful as feedback loops. One of the main purposes of a metric is to provide input into CFAR’s day-to-day and workshop-to-workshop efforts to develop better techniques and refine the workshop. That is hard to do with metrics like the ones in the longitudinal study, because of a combination of a few factors:
The results aren’t available until several months after the workshop, which would make for very slow feedback loops and iteration.
The results are too noisy to tell if changes from one workshop to the next are just random variation. It takes several workshops worth of data to get a clear signal on most of the metrics.
These metrics are only loosely related to what we care about. If a change to the workshop leads to larger increases in conscientiousness that does not necessarily mean that we want to do it, and when a curriculum developer is working on a class they are generally not that interested in these particular metrics.
These metrics are relatively general/coarse indicators of the effect of the workshop as a whole, not tied to particular inputs. So (for example) if we make some changes to the TAPs class and want to see if the new version of the class works better or worse, there isn’t a metric that isolates the effects of the TAPs class from the rest of the workshop.