Consider instead that Trump was elected with over 50% of the popular vote. Perhaps there are more fundamental cultural factors at play than the method used to count ballots.
Winning the popular vote in the current system doesn’t tell you what would happen in a different system. This is the same mistake people make when they talk about who would have won if we didn’t have an electoral college: If we had a different system, candidates would campaign differently and voters would vote differently.
I wonder if it would be helpful to penalize deception only if the CoT doesn’t admit to it. It might be harder generate test data for this since it’s less obvious, but hopefully you’d train the model to be honest in CoT?
I’m thinking of this like the parenting stategy of not punishing children for something bad if they admit unprompted that they did it. Blameless portmortems are also sort-of similar.