Yes, the point of the proof isn’t that the sane pure bets condition and the weak indifference condition are the be-all and end-all of corrigibility. But using the proof’s result, I can notice that your AI will be happy to bet a million dollars against one cent that the shutdown button won’t be pressed, which doesn’t seem desirable. It’s effectively willing to burn arbitrary amounts of utility, if we present it with the right bets.
Ideally, a successful solution to the shutdown problem should violate one or both of these conditions in clear, limited ways which don’t result in unsafe behavior, or which result in suboptimal behavior whose suboptimality falls within well-defined bounds. Rather than guessing-and-checking potential solutions and being surprised when they fail to satisfy both conditions, we should look specifically for non-sane-pure-betters and non-intuitively-indifferent-agents which nevertheless behave corrigibly and desirably.
actually growing up in Seattle my experience has been that people’s narratives of trans rights are in fact making a pretty principled case for both morphological freedom and some kind of more abstract self-labelling freedom. which you can see in how big like, nonbinary and agender self-identifications are, and also a heavy overlap between online trans communities and eg DID and furry communities. so maybe this is just a problem with your generation, or something?