I feel slightly worried about going too deep into discussions along the lines of “Vojta reacts to Chris’ claims about what other LW people argue against hypothetical 1-boxing CDT researchers from classical academia that they haven’t met” :D.
Fair enough. Especially since this post isn’t so much about the way people currently frame their arguments but attempt to persuade people to reframe the discussion around comparability.
My take on how to do counterfactuals correctly is that this is not a property of the world, but of your mental models
According to this view, counterfactuals only make sense if your model contains uncertainty...
I would frame this slightly differently and say that this is the paradigmatic case which forms the basis of our initial definition. I think the example of numbers can be constructive here. The first numbers to be defined are the counting numbers: 1, 2, 3, 4… It is then convenient to add fractions, then zero, then negative numbers and eventually we extend to the complex numbers. In each case we’ve slightly shifted the definition of what a number is and this choice is solely determined by convention. Of course, convention isn’t arbitrary, but determined by what is natural.
Similarly, the cases where there is actual uncertainty provides the initial domain over which we define counterfactuals. And we can then try to extend this as you are doing above. I see this as a very promising approach.
A lot of what you are saying there aligns with my most recent research direction (Counterfactuals as a matter of Social Convention), although it’s unfortunately stalled with coronavirus and my focus being mostly on attempting to write up my ideas from the AI safety program. There seem to be a bunch of properties that make a situation more or less likely to be accepted by humans as a valid counterfactual. I think it would be viable to identify the main factors, with the actual weighting being decided by each human. This would acknowledge both the subjective, constructed nature of counterfactuals, but also the objective elements with real implications that doesn’t make this a completely arbitrary choice. I would be keen to discuss further/bounce ideas of each other if you’d be up for it.
Finally, when some counterfactual would be inconsistent with our model, we might take it for granted that we are supposed to relax M in some manner
This sounds very similar to the erasure approach I was previously promoting, but have shifted away from. Basically, I when I started thinking about it, I realised that only allowing counterfactuals to be constructed by erasing information didn’t match how humans actually use counterfactuals.
Second, when doing counterfactuals, we might take it for granted that you are to replace the actual observation history o by some alternative o′
This is much more relevant to how I think now.
I think that “a typical AF reader” uses a model in which “a typical CDT adherent” can deliberate, come to the one-boxing conclusion, and find 1M in the box, making the options comparable for “typical AF readers”. I think that “a typical CDT adherent” uses a model in which “CDT adherents” find the box empty while one-boxers find it full, thus making the options incomparable
I think that’s an accurate framing of where they are coming from.
The third question I didn’t understand.
What was unclear? I made one typo where I said an EDT agent would smoke when I meant they wouldn’t smoke. Is it clearer now?
Hey Vojta, thanks so much for your thoughts.
Fair enough. Especially since this post isn’t so much about the way people currently frame their arguments but attempt to persuade people to reframe the discussion around comparability.
I feel similarly. I’ve explained my reasons for believing this in the Co-operation Game, Counterfactuals are an Answer, not a Question and Counterfactuals as a matter of Social Convention.
I would frame this slightly differently and say that this is the paradigmatic case which forms the basis of our initial definition. I think the example of numbers can be constructive here. The first numbers to be defined are the counting numbers: 1, 2, 3, 4… It is then convenient to add fractions, then zero, then negative numbers and eventually we extend to the complex numbers. In each case we’ve slightly shifted the definition of what a number is and this choice is solely determined by convention. Of course, convention isn’t arbitrary, but determined by what is natural.
Similarly, the cases where there is actual uncertainty provides the initial domain over which we define counterfactuals. And we can then try to extend this as you are doing above. I see this as a very promising approach.
A lot of what you are saying there aligns with my most recent research direction (Counterfactuals as a matter of Social Convention), although it’s unfortunately stalled with coronavirus and my focus being mostly on attempting to write up my ideas from the AI safety program. There seem to be a bunch of properties that make a situation more or less likely to be accepted by humans as a valid counterfactual. I think it would be viable to identify the main factors, with the actual weighting being decided by each human. This would acknowledge both the subjective, constructed nature of counterfactuals, but also the objective elements with real implications that doesn’t make this a completely arbitrary choice. I would be keen to discuss further/bounce ideas of each other if you’d be up for it.
This sounds very similar to the erasure approach I was previously promoting, but have shifted away from. Basically, I when I started thinking about it, I realised that only allowing counterfactuals to be constructed by erasing information didn’t match how humans actually use counterfactuals.
This is much more relevant to how I think now.
I think that’s an accurate framing of where they are coming from.
What was unclear? I made one typo where I said an EDT agent would smoke when I meant they wouldn’t smoke. Is it clearer now?