I think section 3 still mostly stands, but the arguments to get there change mildly. Section 4 changes a lot more: the distinction between “A’s values, according to A” vs “A’s values, according to B” becomes crucial—i.e. A may have a very different idea than B of what it means for A’s values to be satisfied in extreme out-of-distribution contexts. In the hard version of the problem, there isn’t any clear privileged notion of what “A’s values, according to A” would even mean far out-of-distribution.
I think section 3 still mostly stands, but the arguments to get there change mildly. Section 4 changes a lot more: the distinction between “A’s values, according to A” vs “A’s values, according to B” becomes crucial—i.e. A may have a very different idea than B of what it means for A’s values to be satisfied in extreme out-of-distribution contexts. In the hard version of the problem, there isn’t any clear privileged notion of what “A’s values, according to A” would even mean far out-of-distribution.