On the topic of “why?” reaction—that is just how supervised machine learning works. Model learns the patterns in the training data (and interpolate between data points with the found patterns). And the training data only contains the information about prosecutions, not actual crime. If (purely theoretical) people called Adam were found in the training data guilty 100% of the time—this pattern will be noticed. Even though the name has nothing to do with the crime.
It’s really difficult to get truly unbiased training data. There are bias mitigation algorithms that can be applied after the fact on the model trained on biased data but they have their own problems. First of all their efficiency in bias mitigation itself usually varies from “bad” to “meh” at best. And more importantly most of them work by introducing counter-bias that can infuriate people that one will be biased against now and that counter-bias will have its own detrimental secondary effects. And this correction usually makes the model in general less accurate.
Giving physical analogy to attempts to fix the model “after the fact”…
If one of the blades of the helicopter get chipped and become 10cm shorter—you don’t want to fly on this unbalanced heavy rotating murder shuriken now. You can “balance” it by chipping the opposite blade the same way or all the blades the same way, but while solving the balance now you have less thrust and smaller weight of the component and you need to update everything else in the helicopter to accommodate etc etc. So in reality you just throw away chipped blade and get a new one.
Unfortunately sometimes you can’t get unbiased data because it doesn’t exist.
On the topic of “why?” reaction—that is just how supervised machine learning works. Model learns the patterns in the training data (and interpolate between data points with the found patterns). And the training data only contains the information about prosecutions, not actual crime. If (purely theoretical) people called Adam were found in the training data guilty 100% of the time—this pattern will be noticed. Even though the name has nothing to do with the crime.
It’s really difficult to get truly unbiased training data. There are bias mitigation algorithms that can be applied after the fact on the model trained on biased data but they have their own problems. First of all their efficiency in bias mitigation itself usually varies from “bad” to “meh” at best. And more importantly most of them work by introducing counter-bias that can infuriate people that one will be biased against now and that counter-bias will have its own detrimental secondary effects. And this correction usually makes the model in general less accurate.
Giving physical analogy to attempts to fix the model “after the fact”… If one of the blades of the helicopter get chipped and become 10cm shorter—you don’t want to fly on this unbalanced heavy rotating murder shuriken now. You can “balance” it by chipping the opposite blade the same way or all the blades the same way, but while solving the balance now you have less thrust and smaller weight of the component and you need to update everything else in the helicopter to accommodate etc etc. So in reality you just throw away chipped blade and get a new one.
Unfortunately sometimes you can’t get unbiased data because it doesn’t exist.