The point of that part of my comment was that insofar as part of Nora/Quintin’s response to simplicity argument is to say that we have active evidence that SGD’s inductive biases disfavor schemers, this seems worth just arguing for directly, since even if e.g. counting arguments were enough to get you worried about schemers from a position of ignorance about SGD’s inductive biases, active counter-evidence absent such ignorance could easily make schemers seem quite unlikely overall.
There’s a separate question of whether e.g. counting arguments like mine above (e.g., “A very wide variety of goals can prompt scheming; By contrast, non-scheming goals need to be much more specific to lead to high reward; I’m not sure exactly what sorts of goals SGD’s inductive biases favor, but I don’t have strong reason to think they actively favor non-schemer goals; So, absent further information, and given how many goals-that-get-high-reward are schemer-like, I should be pretty worried that this model is a schemer”) do enough evidence labor to privilege schemers as a hypothesis at all. But that’s the question at issue in the rest of my comment. And in e.g. the case of “there are 1000 chinese restaurants in this, and only ~100 non-chinese restaurants,” the number of chinese restaurants seems to me like it’s enough to privilege “Bob went to a chinese restaurant” as a hypothesis (and this even without thinking that he made his choice by sampling randomly from a uniform distribution over restaurants). Do you disagree in that restaurant case?
The point of that part of my comment was that insofar as part of Nora/Quintin’s response to simplicity argument is to say that we have active evidence that SGD’s inductive biases disfavor schemers, this seems worth just arguing for directly, since even if e.g. counting arguments were enough to get you worried about schemers from a position of ignorance about SGD’s inductive biases, active counter-evidence absent such ignorance could easily make schemers seem quite unlikely overall.
There’s a separate question of whether e.g. counting arguments like mine above (e.g., “A very wide variety of goals can prompt scheming; By contrast, non-scheming goals need to be much more specific to lead to high reward; I’m not sure exactly what sorts of goals SGD’s inductive biases favor, but I don’t have strong reason to think they actively favor non-schemer goals; So, absent further information, and given how many goals-that-get-high-reward are schemer-like, I should be pretty worried that this model is a schemer”) do enough evidence labor to privilege schemers as a hypothesis at all. But that’s the question at issue in the rest of my comment. And in e.g. the case of “there are 1000 chinese restaurants in this, and only ~100 non-chinese restaurants,” the number of chinese restaurants seems to me like it’s enough to privilege “Bob went to a chinese restaurant” as a hypothesis (and this even without thinking that he made his choice by sampling randomly from a uniform distribution over restaurants). Do you disagree in that restaurant case?