Models trained for HHH are likely not trained to be corrigible. Models should be trained to be corrigible too in addition to other propensities.
Corrigibility may be included in Helpfulness (alone) but when adding Harmlessness then Corrigibility conditional on being changed to be harmful is removed. So the result is not that surprising from that point of view.
It seems that your point applies significantly more to “zero-sum markets”. So it may be good to notice it may not apply for altruistic people when non-instrumentally working on AI safety.