Okay, that makes more sense now. So the obvious question would be about the interpretation of low/high frequency. That’s quite a low-level concept. Does it correspond to simplicity/complexity? Or underfitting/overfitting?
I think it’s unclear what it corresponds to. I agree the concept is quite low-level. It doesn’t seem obvious to me how to build up high-level concepts from “low-frequency” building blocks and judge if the result is low-frequency or not. That’s one reason I’m not super-persuaded by Nora Belrose’ argument that deception if high-frequency, as the argument seems too vague. However, it’s not like anyone else is doing much better at the moment e.g. the claims that utility maximization has “low description length” are about as hand-wavy to me.
That’s an error. Thank you for pointing it out!
Okay, that makes more sense now. So the obvious question would be about the interpretation of low/high frequency. That’s quite a low-level concept. Does it correspond to simplicity/complexity? Or underfitting/overfitting?
I think it’s unclear what it corresponds to. I agree the concept is quite low-level. It doesn’t seem obvious to me how to build up high-level concepts from “low-frequency” building blocks and judge if the result is low-frequency or not. That’s one reason I’m not super-persuaded by Nora Belrose’ argument that deception if high-frequency, as the argument seems too vague. However, it’s not like anyone else is doing much better at the moment e.g. the claims that utility maximization has “low description length” are about as hand-wavy to me.