The behavioural problems idea theory could be tested by looking not at the average (mean) earning in later life but the median for each class.
So in a class of 15, one child has very poor behaviour. All of them have slightly lower test scores that year. 20 years later the other 14 get ordinary jobs for the area/cohort, but that adult who was a naughty child is maybe unemployed, in prison or employed in a job that pays less than normal for the area/cohort. They bring the mean down. But, are the other 14 still where they would be? If they are unmoved then the median will not have shifted much.
In a wider sense. The people doing these studies are (it sounds like) all doing proper and careful statistics. Unfortunately the great majority of school administrators/principles etc. are not native statisticians, and work in quite a politically charged environment. This means that even well-researched, careful studies of thing like “VAM” can be damaging to teachers careers (even good teachers) and damaging to the education system at large.
Here in the UK the government decided that during COVID they would ask teachers to predict the exam results students would have got had the exams not been cancelled. The predictions were (on average) higher than the previous year, so they programmed a computer to change the grades until the statistical distribution was the same as the year before on a school by school basis. The most extreme case flagged was a small language school where only one student would have done an exam. They were awarded something like a D, because the only exam ever previously sat at that school had been a D result. I have a friend who is a teacher who was ordered to set up an internal test of about 5 physics students (special needs school. Small classes). Then the computer (and computer like administrators) were all very unhappy. She was warned by one system that 0% failure was way way way too low, while another freaked out about the one test with a 20% failure rate. She claims none of these people seemed to realise that 10% of 5 was not possible—even when it was explained.
Long-winded way of saying that even if VAM is completely accurate, with no artefacts, then it is still probably a bad idea to hire/fire teachers based off of it. Just because the people doing that process are not equipped with the statistical knowledge to implement the process correctly.
All very interesting.
The behavioural problems idea theory could be tested by looking not at the average (mean) earning in later life but the median for each class.
So in a class of 15, one child has very poor behaviour. All of them have slightly lower test scores that year. 20 years later the other 14 get ordinary jobs for the area/cohort, but that adult who was a naughty child is maybe unemployed, in prison or employed in a job that pays less than normal for the area/cohort. They bring the mean down. But, are the other 14 still where they would be? If they are unmoved then the median will not have shifted much.
In a wider sense. The people doing these studies are (it sounds like) all doing proper and careful statistics. Unfortunately the great majority of school administrators/principles etc. are not native statisticians, and work in quite a politically charged environment. This means that even well-researched, careful studies of thing like “VAM” can be damaging to teachers careers (even good teachers) and damaging to the education system at large.
Here in the UK the government decided that during COVID they would ask teachers to predict the exam results students would have got had the exams not been cancelled. The predictions were (on average) higher than the previous year, so they programmed a computer to change the grades until the statistical distribution was the same as the year before on a school by school basis. The most extreme case flagged was a small language school where only one student would have done an exam. They were awarded something like a D, because the only exam ever previously sat at that school had been a D result. I have a friend who is a teacher who was ordered to set up an internal test of about 5 physics students (special needs school. Small classes). Then the computer (and computer like administrators) were all very unhappy. She was warned by one system that 0% failure was way way way too low, while another freaked out about the one test with a 20% failure rate. She claims none of these people seemed to realise that 10% of 5 was not possible—even when it was explained.
Long-winded way of saying that even if VAM is completely accurate, with no artefacts, then it is still probably a bad idea to hire/fire teachers based off of it. Just because the people doing that process are not equipped with the statistical knowledge to implement the process correctly.