I agree with everything you said, it seems like common sense. I’m going by the quote above, and by the similar sentiment in multiple MIRI papers, that there are “logically impossible worlds” described by the counterfactuals like “same agent, same input, different outcome”. I am lost as to why this terminology is useful at all. Again, odds are, I am missing something here.
An input to another algorithm may be our source code, with the other algorithm’s output depending on what they can prove about our output. If we assume they reason consistently, and want to prove something about their output, we might assume what they prove about us even when that later turns out impossible.
I agree with everything you said, it seems like common sense. I’m going by the quote above, and by the similar sentiment in multiple MIRI papers, that there are “logically impossible worlds” described by the counterfactuals like “same agent, same input, different outcome”. I am lost as to why this terminology is useful at all. Again, odds are, I am missing something here.
An input to another algorithm may be our source code, with the other algorithm’s output depending on what they can prove about our output. If we assume they reason consistently, and want to prove something about their output, we might assume what they prove about us even when that later turns out impossible.