Yes, but your original comment was presented as explaining “how to properly reason about counting arguments.” Do you no longer claim that to be the case? If you do still claim that, then I maintain my objection that you yourself used hand-wavy reasoning in that comment, and it seems incorrect to present that reasoning as unusually formally supported.
Another concern I have is, I don’t think you’re gaining anything by formality in this thread. As I understand your argument, I think your symbols are formalizations of hand-wavy intuitions (like the ability to “decompose” a network into the given pieces; the assumption that description length is meaningfully relevant to the NN prior; assumptions about informal notions of “simplicity” being realized in a given UTM prior). If anything, I think that the formality makes things worse because it makes it harder to evaluate or critique your claims.
I also don’t think I’ve seen an example of reasoning about deceptive alignment where I concluded that formality had helped the case, as opposed to obfuscated the case or lent the concern unearned credibility.
The main thing I was trying to show there is just that having the formalism prevents you from making logical mistakes in how to apply counting arguments in general, as I think was done in this post. So my comment is explaining how to use the formalism to avoid mistakes like that, not trying to work through the full argument for deceptive alignment.
It’s not that the formalism provides really strong evidence for deceptive alignment, it’s that it prevents you from making mistakes in your reasoning. It’s like plugging your argument into a proof-checker: it doesn’t check that your argument is correct, since the assumptions could be wrong, but it does check that your argument is sound.
Yes, but your original comment was presented as explaining “how to properly reason about counting arguments.” Do you no longer claim that to be the case? If you do still claim that, then I maintain my objection that you yourself used hand-wavy reasoning in that comment, and it seems incorrect to present that reasoning as unusually formally supported.
Another concern I have is, I don’t think you’re gaining anything by formality in this thread. As I understand your argument, I think your symbols are formalizations of hand-wavy intuitions (like the ability to “decompose” a network into the given pieces; the assumption that description length is meaningfully relevant to the NN prior; assumptions about informal notions of “simplicity” being realized in a given UTM prior). If anything, I think that the formality makes things worse because it makes it harder to evaluate or critique your claims.
I also don’t think I’ve seen an example of reasoning about deceptive alignment where I concluded that formality had helped the case, as opposed to obfuscated the case or lent the concern unearned credibility.
The main thing I was trying to show there is just that having the formalism prevents you from making logical mistakes in how to apply counting arguments in general, as I think was done in this post. So my comment is explaining how to use the formalism to avoid mistakes like that, not trying to work through the full argument for deceptive alignment.
It’s not that the formalism provides really strong evidence for deceptive alignment, it’s that it prevents you from making mistakes in your reasoning. It’s like plugging your argument into a proof-checker: it doesn’t check that your argument is correct, since the assumptions could be wrong, but it does check that your argument is sound.