Conditional on such counterexamples existing, I would usually expect to not notice them. Even if someone displayed such a counterexample, it would presumably be quite difficult to verify that it is a counterexample. Therefore a lack of observation of such counterexamples is, at most, very weak evidence against their existence; we are forced to fall back on priors.
You can check whether there are examples where it takes an hour to notice a problem, or 10 hours, or 100 hours… You can check whether there are examples that require lots of expertise to evaluate. And so on. the question isn’t whether there is some kind of magical example that is literally impossible to notice, it’s whether there are cases where verification is hard relative to generation!
You can check whether you can generate examples, or whether other people believe that they can generate examples. The question is about whether a slightly superhuman AI can find examples, not whether they exist (and indeed whether they exist is more unfalsifiable, not because of the difficulty of recognizing them but because of the difficulty of finding them).
You can look for examples in domains where the ground truth is available. E.g. we can debate about the existence of bugs or vulnerabilities in software, and then ultimately settle the question by running the code and having someone demonstrate a vulnerability. If Alice claims something is a vulnerability but I can’t verify her reasoning, then she can still demonstrate that it was correct by going and attacking the system.
I’ve looked at e.g. some results from the underhanded C competition and they are relatively easy for laypeople to recognize in a short amount of time when the attack is pointed out. I have not seen examples of attacks that are hard to recognize as plausible attacks without significant expertise or time, and I am legitimately interested in them.
I’m bowing out here, you are welcome to the last word.
You can check whether there are examples where it takes an hour to notice a problem, or 10 hours, or 100 hours… You can check whether there are examples that require lots of expertise to evaluate. And so on. the question isn’t whether there is some kind of magical example that is literally impossible to notice, it’s whether there are cases where verification is hard relative to generation!
You can check whether you can generate examples, or whether other people believe that they can generate examples. The question is about whether a slightly superhuman AI can find examples, not whether they exist (and indeed whether they exist is more unfalsifiable, not because of the difficulty of recognizing them but because of the difficulty of finding them).
You can look for examples in domains where the ground truth is available. E.g. we can debate about the existence of bugs or vulnerabilities in software, and then ultimately settle the question by running the code and having someone demonstrate a vulnerability. If Alice claims something is a vulnerability but I can’t verify her reasoning, then she can still demonstrate that it was correct by going and attacking the system.
I’ve looked at e.g. some results from the underhanded C competition and they are relatively easy for laypeople to recognize in a short amount of time when the attack is pointed out. I have not seen examples of attacks that are hard to recognize as plausible attacks without significant expertise or time, and I am legitimately interested in them.
I’m bowing out here, you are welcome to the last word.