I feel like the crux of this discussion is how much we should adjust our behavior to be “less utilitarian”, to preserve our utilitarian values.
The expected utility that a person created could be measured by (utility created by behavior) x (odds that they will actually follow through on their behavior), where the odds of follow-up decrease as the behavior modifications become more drastic, but the utility created if followed through increases.
People are already implicitly taking this account when evaluating what the optimal amount of radicality in activism is. If PETA advocates for everyone to completely renounce animal consumption, conduct violent attacks on factory farms, and aggressively confront non-vegans, that (theoretically) would reduce animal suffering by an extremely large amount. But in practice almost nobody would follow through. On the other hand, if PETA mistakenly centers their activism on calling for people to skip a single chicken dinner, a completely realistic goal that many millions of people would presumably execute, they would also be missing on a lot of expected utility.
Alice is arguing that Bob could maximize expected utility by shifting his behavior to a part of the curve that involves more behavior change, and therefore utility, and less probability of follow-through. Bob is arguing that he’s already at the optimal point of the curve.
I think that a good analogy would be to compare the genome with the hyperparameters of neural networks. It’s not perfect, the genome influences human “training” in a much more indirect way (brain design, neurotransmitters) than hyperparameters, but it shows that evolutionary optimization of the genome (hyperparameters) happens on a different level than actual learning (human learning and training).