Education, like healthcare, is very expensive, mostly carried out for laypeople by trained specialists, and is generally considered (excepting people promoting their own one-size-fits-all solutions) a really knotty & complex thing to do
Let’s look at schools. Having a masters degree in teaching doesn’t result in the teachers students getting higher grades on standardized tests. Unions still press schools to pay people who have a masters degree in education more money and mostly succeed at opposing pay-for-performance.
Education is a good example of a field where teachers are payed for useless training instead of being payed for producing outcomes for students.
Bill Gates suggest in his TED talk:
So, how do you make education better?
Now, our foundation, for the last nine years, has invested in this. There’s many people working on it. We’ve worked on small schools, we’ve funded scholarships, we’ve done things in libraries. A lot of these things had a good effect. But the more we looked at it, the more we realized that having great teachers was the very key thing. And we hooked up with some people studying how much variation is there between teachers, between, say, the top quartile—the very best—and the bottom quartile. How much variation is there within a school or between schools? And the answer is that these variations are absolutely unbelievable. A top quartile teacher will increase the performance of their class—based on test scores—by over 10 percent in a single year. What does that mean? That means that if the entire U.S., for two years, had top quartile teachers, the entire difference between us and Asia would go away.
Given that citing a TED talk as a response to a scientific paper is a bit bad form, I added a question on skeptics to verify the claim.
I think part of the problem of the cited study is likely that it’s done on data from before no-child-left-behind and the efforts of the Gates Foundation. It’s also simply possible to find groups of teachers where there’s little variance in teaching skill, that doesn’t mean that differences don’t exist on a larger scale.
Thought experiment: suppose every doctor were replaced by identical computers all running the same treatment-recommending software. How much does the variance in patient outcomes decrease?
There are treatment that require high skill to administer like Brain surgery and treatments that require less skill like handing over a pill. For the high skill tasks I do think that variance in patient outcomes would decrease.
But even for simply taking a pill a doctor can spend five minutes to hand over the pill or he can spend an hour to talk through the issue with the patient and make sure that the patient has TAP’s to actually take the pills according to the schedule.
Currently there’s no economic reason to spend that hour. It can’t be billed to the insurance company. Even if currently everybody spends five minutes for such a patient you would suddenly get variance if you would start to pay some doctors by the outcome instead of simply paying them per visit.
This matters because it means individual practitioners are going to have a hard time beating the EBM approach at estimating treatment effects, because the statistical win of assessing treatments at the finer-grained level of the practitioner is going to be more than cancelled out by the statistical loss of each practitioner having a smaller sample to refer to.
I don’t think that large sample sizes are everything. I think you can learn a lot by looking very carefully at the details of single cases.
Additionally the person who’s the best in city X at treating disease Y for class of patients Z might get more than 40 patients of class Z with Y because everybody wants to be treated by the best.
If all those expensively trained doctors achieve the same outcomes there’s also a question of why we limit their supply as strongly as we are doing it at the present by forcing to them to have the expensive training. In that case the way to go would be to get people with cheaper training to compete in the market of the highly trained doctors. In the absence of performance tracking this won’t be possible because the expensively trained doctors have more prestige.
Education is a good example of a field where teachers are payed for useless training instead of being payed for producing outcomes for students.
That doesn’t surprise me. Training and degrees are easily observable; outcomes (or, rather, how much of an outcome is attributable to each teacher) are not. It’s harder to pay people based on something less observable.
Given that citing a TED talk as a response to a scientific paper is a bit bad form, I added a question on skeptics to verify the claim.
Thanks!
But I’m not sure it matters. What Bill Gates says in that quotation might well be consistent with what I wrote — it’s hard to be sure because he’s quite vague.
For instance, he says that a teacher in the top quartile increases their class’ performance “by over 10 percent in a single year”. I can believe that, but maybe all it means is that a top-quartile teacher increases their class’ scores by 11% a year, while a bottom-quartile teacher increases their class’ scores by 9% a year. That would hardly refute the idea that teacher effects are small on average!
I think part of the problem of the cited study is likely that it’s done on data from before no-child-left-behind and the efforts of the Gates Foundation.
Maybe. I did a bit more searching on Google Scholar but didn’t uncover a more recent review article with relevant statistics. (I did find one study of a thousand students in HSGI schools in “school year 2013-2014”, which found that 12% of the variance in GPA was at the teacher level.)
However, while the study I cited uses only pre-2004 data, I don’t see much reason to think a newer study would reveal a big increase in teacher-level variance in outcomes. The low proportion of variance attributable to teachers has been true, as far as I know, for as long as people have investigated it (at least 46 years). I’m doubtful that an act which demands high-stakes testing, improvements in average state-wide test scores, and state-wide standards for teachers has changed that, or that a charity has changed that.
It’s also simply possible to find groups of teachers where there’s little variance in teaching skill, that doesn’t mean that differences don’t exist on a larger scale.
It’s possible, but I don’t see evidence that the paper I cited has this flaw. Its new analysis was based on about 100 Tennessee schools included in Project STAR, and the earlier analyses it summarized (table 1) used “samples of poor or minority students”, “nationally representative samples of students”, and “a large sample of public school students in Texas”.
There are treatment that require high skill to administer like Brain surgery and treatments that require less skill like handing over a pill. For the high skill tasks I do think that variance in patient outcomes would decrease.
Fair point — my thought experiment only addresses diagnosis and treatment recommendation, not treatment administration. I think there would be more doctor-level variation among the latter...although not much more in absolute terms.
But even for simply taking a pill a doctor can spend five minutes to hand over the pill or he can spend an hour to talk through the issue with the patient and make sure that the patient has TAP’s to actually take the pills according to the schedule.
Currently there’s no economic reason to spend that hour.
Probably true in most places.
Even if currently everybody spends five minutes for such a patient you would suddenly get variance if you would start to pay some doctors by the outcome instead of simply paying them per visit.
Agreed that you’d get more variance, but I suspect it wouldn’t be much more (subject to this hypothetical’s exact details).
I don’t think that large sample sizes are everything. I think you can learn a lot by looking very carefully at the details of single cases.
This is certainly true. I’ve seen too many stories of people with rare genetic conditions and other diseases successfully working out aetiologies to think otherwise. But those are unusual cases where people tended to invest lots of effort into figuring their cases out. I don’t see them as signs that we can improve the quality-to-cost ratio of healthcare in general by looking very carefully at the details of each case.
Additionally the person who’s the best in city X at treating disease Y for class of patients Z might get more than 40 patients of class Z with Y because everybody wants to be treated by the best.
Sure. But the ratios in my example are more important than the exact numbers.
If all those expensively trained doctors achieve the same outcomes there’s also a question of why we limit their supply as strongly as we are doing it at the present by forcing to them to have the expensive training. In that case the way to go would be to get people with cheaper training to compete in the market of the highly trained doctors.
A good question, and a good answer to the question!
In the absence of performance tracking this won’t be possible because the expensively trained doctors have more prestige.
It’d be immediately possible in the US: eliminate de facto immigration barriers for foreign doctors. Those barriers are, I’d guess, lower in other developed countries (hence why doctors in the UK, Germany, Canada, etc. earn less than US doctors) but I expect doctors’ salaries there could also be reduced a bit by further relaxing immigration restrictions for foreign doctors.
Another option is for patients to go to the cheaper doctors: medical tourism.
For instance, he says that a teacher in the top quartile increases their class’ performance “by over 10 percent in a single year”. I can believe that, but maybe all it means is that a top-quartile teacher increases their class’ scores by 11% a year, while a bottom-quartile teacher increases their class’ scores by 9% a year. That would hardly refute the idea that teacher effects are small on average!
I understand him to be speaking about them increasing 10% more than non-top quartile teachers.
Eg. enough to circumvent the US-Asia difference in two years and also enough to circumvent the Black-White difference in four years as suggested in the answer to the Stackexchange question.
It’d be immediately possible in the US: eliminate de facto immigration barriers for foreign doctors.
The article doesn’t only describe immigration barriers but also barriers of credentialism. The non-US degree often isn’t enough to work in the US as a doctor because it’s quality is in doubt.
If there would be hard evidence that those doctors perform as well as US doctors the ability to seek rents via credentialism will be reduced.
Maybe. I did a bit more searching on Google Scholar but didn’t uncover a more recent review article with relevant statistics. (I did find one study of a thousand students in HSGI schools in “school year 2013-2014”, which found that 12% of the variance in GPA was at the teacher level.)
I looked at that study. It seems their best predictors were e (i) Fall algebra EOC (End Of Course) scores,
(ii) English language learner (ELL) status, (iii) Black student status, and (iv) Hispanic student status.
Of course you do a better prediction of the student performance for a standardized test when you look at the last standardized test they took than when you look at whether or not they had a very good teacher for a single school year.
The 12% variance under that setting might be compatible with the claims that Gates makes. 12% variance per year might compound over multiple years to bridge the gap between the US and Asia in two years and US Black White gap in 4 years.
I understand him to be speaking about them increasing 10% more than non-top quartile teachers.
OK, thanks for clarifying. That sounds like a more impressive effect. At the same time, it’s probably still consistent with teacher quality explaining only 10% of the variance in student performance.
I’ll do back-of-envelope arithmetic to demonstrate. The median top-quartile teacher is at the 88th percentile. The median non-top quartile teacher is at the 38th. Suppose, just to allow me to arrive at concrete numbers, teacher quality has a normal distribution. Then the median top-quartile teacher is 1.48 standard deviations better than the median non-top quartile teacher. Now, an R^2 of 10% implies a correlation of sqrt(10%) = 0.23 between teacher quality and pupil performance, so the difference in pupil performance between the median non-top quartile teacher and the median top-quartile teacher is 1.48 * 0.23 = 0.34 standard deviations. That’s a statistically detectable effect, and one that could well translate into 10% higher test scores after a year with the better teachers.
Eg. enough to circumvent the US-Asia difference in two years and also enough to circumvent the Black-White difference in four years as suggested in the answer to the Stackexchange question.
Plausible. If I remember correctly the black/white difference is about 1 standard deviation, so if my estimated effect size of 0.34 SD for good vs. less good teachers is accurate and can be built on year by year, it’s enough to close the black/white difference in 3 years. I don’t know the US-Asia difference but probably the same kind of logic applies.
Agreed, medical error is a real & substantial issue. I am just dubious about the ability of some proposals to inexpensively reduce fatal medical error. (But I am optimistic about others. Checklists seem promising.)
The article doesn’t only describe immigration barriers but also barriers of credentialism.
The way I would put it is that the credentialism barriers are the immigration barriers. AFAIK the explicit immigration barriers for foreign doctors looking to enter the US and practice in the US aren’t the bottleneck; the requirement that the doctor do a US residence programme, or a degree from a US school, is a much stronger de facto bar to immigrating.
I am just dubious about the ability of some proposals to inexpensively reduce fatal medical error. (But I am optimistic about others. Checklists seem promising.)
In the present system there aren’t strong economic incentives to reduce medical error. If you consider Checklists to be promising, then the lack of any economic incentives to use their virtues might be part of the reason why they don’t get adopted.
The incentive system of doing procedures that can be billed because they are included in a list of billable procedures and doing them in a defensive way that survives a lawsuit is bad. It means that money is wasted for procedures that cost a lot of money and provide little benefit. It also means that policies such as checklists (if we grant them to work) don’t get incentivised.
The whole system is unable to incentivise cheap solutions. Scott’s post about the inability of a hospital to prescribe Melatonin to it’s patients is illustrative:
This is why the story of Ramelteon scares me so much – not because it’s a bad drug, because it isn’t. But because one of the most basic and useful human hormones got completely excluded from medicine just because it didn’t have a drug company to push it. And the only way it managed to worm its way back in was to have a pharmaceutial company spend a decade and several hundred million dollars to tweak its chemical structure very slightly, patent it, and market it as a hot new drug at a 2000% markup.
The way I would put it is that the credentialism barriers are the immigration barriers.
From a political perspective immigration and credentialism are two different subjects, you have to convince different constituencies to create change.
In the present system there aren’t strong economic incentives to reduce medical error. [etc.]
I think this is broadly correct, certainly in the case of the US medical system.
From a political perspective immigration and credentialism are two different subjects, you have to convince different constituencies to create change.
Yes, from the standpoint of effecting political change, one might have to treat them as two different subjects, even though w.r.t. doctors in the US the two greatly overlap.
Let’s look at schools. Having a masters degree in teaching doesn’t result in the teachers students getting higher grades on standardized tests. Unions still press schools to pay people who have a masters degree in education more money and mostly succeed at opposing pay-for-performance.
Education is a good example of a field where teachers are payed for useless training instead of being payed for producing outcomes for students.
Bill Gates suggest in his TED talk:
Given that citing a TED talk as a response to a scientific paper is a bit bad form, I added a question on skeptics to verify the claim.
I think part of the problem of the cited study is likely that it’s done on data from before no-child-left-behind and the efforts of the Gates Foundation. It’s also simply possible to find groups of teachers where there’s little variance in teaching skill, that doesn’t mean that differences don’t exist on a larger scale.
There are treatment that require high skill to administer like Brain surgery and treatments that require less skill like handing over a pill. For the high skill tasks I do think that variance in patient outcomes would decrease.
But even for simply taking a pill a doctor can spend five minutes to hand over the pill or he can spend an hour to talk through the issue with the patient and make sure that the patient has TAP’s to actually take the pills according to the schedule.
Currently there’s no economic reason to spend that hour. It can’t be billed to the insurance company. Even if currently everybody spends five minutes for such a patient you would suddenly get variance if you would start to pay some doctors by the outcome instead of simply paying them per visit.
I don’t think that large sample sizes are everything. I think you can learn a lot by looking very carefully at the details of single cases.
Additionally the person who’s the best in city X at treating disease Y for class of patients Z might get more than 40 patients of class Z with Y because everybody wants to be treated by the best.
If all those expensively trained doctors achieve the same outcomes there’s also a question of why we limit their supply as strongly as we are doing it at the present by forcing to them to have the expensive training. In that case the way to go would be to get people with cheaper training to compete in the market of the highly trained doctors. In the absence of performance tracking this won’t be possible because the expensively trained doctors have more prestige.
That doesn’t surprise me. Training and degrees are easily observable; outcomes (or, rather, how much of an outcome is attributable to each teacher) are not. It’s harder to pay people based on something less observable.
Thanks!
But I’m not sure it matters. What Bill Gates says in that quotation might well be consistent with what I wrote — it’s hard to be sure because he’s quite vague.
For instance, he says that a teacher in the top quartile increases their class’ performance “by over 10 percent in a single year”. I can believe that, but maybe all it means is that a top-quartile teacher increases their class’ scores by 11% a year, while a bottom-quartile teacher increases their class’ scores by 9% a year. That would hardly refute the idea that teacher effects are small on average!
Maybe. I did a bit more searching on Google Scholar but didn’t uncover a more recent review article with relevant statistics. (I did find one study of a thousand students in HSGI schools in “school year 2013-2014”, which found that 12% of the variance in GPA was at the teacher level.)
However, while the study I cited uses only pre-2004 data, I don’t see much reason to think a newer study would reveal a big increase in teacher-level variance in outcomes. The low proportion of variance attributable to teachers has been true, as far as I know, for as long as people have investigated it (at least 46 years). I’m doubtful that an act which demands high-stakes testing, improvements in average state-wide test scores, and state-wide standards for teachers has changed that, or that a charity has changed that.
It’s possible, but I don’t see evidence that the paper I cited has this flaw. Its new analysis was based on about 100 Tennessee schools included in Project STAR, and the earlier analyses it summarized (table 1) used “samples of poor or minority students”, “nationally representative samples of students”, and “a large sample of public school students in Texas”.
Fair point — my thought experiment only addresses diagnosis and treatment recommendation, not treatment administration. I think there would be more doctor-level variation among the latter...although not much more in absolute terms.
Probably true in most places.
Agreed that you’d get more variance, but I suspect it wouldn’t be much more (subject to this hypothetical’s exact details).
This is certainly true. I’ve seen too many stories of people with rare genetic conditions and other diseases successfully working out aetiologies to think otherwise. But those are unusual cases where people tended to invest lots of effort into figuring their cases out. I don’t see them as signs that we can improve the quality-to-cost ratio of healthcare in general by looking very carefully at the details of each case.
Sure. But the ratios in my example are more important than the exact numbers.
A good question, and a good answer to the question!
It’d be immediately possible in the US: eliminate de facto immigration barriers for foreign doctors. Those barriers are, I’d guess, lower in other developed countries (hence why doctors in the UK, Germany, Canada, etc. earn less than US doctors) but I expect doctors’ salaries there could also be reduced a bit by further relaxing immigration restrictions for foreign doctors.
Another option is for patients to go to the cheaper doctors: medical tourism.
I understand him to be speaking about them increasing 10% more than non-top quartile teachers.
Eg. enough to circumvent the US-Asia difference in two years and also enough to circumvent the Black-White difference in four years as suggested in the answer to the Stackexchange question.
It’s worth noting that Medical Error is the third leading cause of death in the US http://www.bmj.com/content/353/bmj.i2139
The article doesn’t only describe immigration barriers but also barriers of credentialism. The non-US degree often isn’t enough to work in the US as a doctor because it’s quality is in doubt. If there would be hard evidence that those doctors perform as well as US doctors the ability to seek rents via credentialism will be reduced.
I looked at that study. It seems their best predictors were e (i) Fall algebra EOC (End Of Course) scores, (ii) English language learner (ELL) status, (iii) Black student status, and (iv) Hispanic student status.
Of course you do a better prediction of the student performance for a standardized test when you look at the last standardized test they took than when you look at whether or not they had a very good teacher for a single school year. The 12% variance under that setting might be compatible with the claims that Gates makes. 12% variance per year might compound over multiple years to bridge the gap between the US and Asia in two years and US Black White gap in 4 years.
OK, thanks for clarifying. That sounds like a more impressive effect. At the same time, it’s probably still consistent with teacher quality explaining only 10% of the variance in student performance.
I’ll do back-of-envelope arithmetic to demonstrate. The median top-quartile teacher is at the 88th percentile. The median non-top quartile teacher is at the 38th. Suppose, just to allow me to arrive at concrete numbers, teacher quality has a normal distribution. Then the median top-quartile teacher is 1.48 standard deviations better than the median non-top quartile teacher. Now, an R^2 of 10% implies a correlation of sqrt(10%) = 0.23 between teacher quality and pupil performance, so the difference in pupil performance between the median non-top quartile teacher and the median top-quartile teacher is 1.48 * 0.23 = 0.34 standard deviations. That’s a statistically detectable effect, and one that could well translate into 10% higher test scores after a year with the better teachers.
Plausible. If I remember correctly the black/white difference is about 1 standard deviation, so if my estimated effect size of 0.34 SD for good vs. less good teachers is accurate and can be built on year by year, it’s enough to close the black/white difference in 3 years. I don’t know the US-Asia difference but probably the same kind of logic applies.
Agreed, medical error is a real & substantial issue. I am just dubious about the ability of some proposals to inexpensively reduce fatal medical error. (But I am optimistic about others. Checklists seem promising.)
The way I would put it is that the credentialism barriers are the immigration barriers. AFAIK the explicit immigration barriers for foreign doctors looking to enter the US and practice in the US aren’t the bottleneck; the requirement that the doctor do a US residence programme, or a degree from a US school, is a much stronger de facto bar to immigrating.
I agree with your last paragraph.
In the present system there aren’t strong economic incentives to reduce medical error. If you consider Checklists to be promising, then the lack of any economic incentives to use their virtues might be part of the reason why they don’t get adopted.
The incentive system of doing procedures that can be billed because they are included in a list of billable procedures and doing them in a defensive way that survives a lawsuit is bad. It means that money is wasted for procedures that cost a lot of money and provide little benefit. It also means that policies such as checklists (if we grant them to work) don’t get incentivised.
The whole system is unable to incentivise cheap solutions. Scott’s post about the inability of a hospital to prescribe Melatonin to it’s patients is illustrative:
This is why the story of Ramelteon scares me so much – not because it’s a bad drug, because it isn’t. But because one of the most basic and useful human hormones got completely excluded from medicine just because it didn’t have a drug company to push it. And the only way it managed to worm its way back in was to have a pharmaceutial company spend a decade and several hundred million dollars to tweak its chemical structure very slightly, patent it, and market it as a hot new drug at a 2000% markup.
From a political perspective immigration and credentialism are two different subjects, you have to convince different constituencies to create change.
I think this is broadly correct, certainly in the case of the US medical system.
Yes, from the standpoint of effecting political change, one might have to treat them as two different subjects, even though w.r.t. doctors in the US the two greatly overlap.