Yes, except I would object to phrasing this anthropic stuff as “we should expect ourselves to be agents that exist in a universe that abstracts well” instead of “we should value universe that abstracts well (or other universes that contain many instances of us)”—there is no coherence theorems that force summation of your copies, right? And so it becomes apparent that we can value some other thing.
Also even if you consider some memories a part of your identity, you can value yourself slightly less after forgetting them, instead of only having threshold for death.
There is no such disagreement, you just can’t test all inputs. And without knowledge of how internals work, you may me wrong about extrapolating alignment to future systems.