I think it would be much more interesting and helpful to exhibit a case of software with a vulnerability where it’s really hard for someone to verify the claim that the vulnerability exists.
Conditional on such counterexamples existing, I would usually expect to not notice them. Even if someone displayed such a counterexample, it would presumably be quite difficult to verify that it is a counterexample. Therefore a lack of observation of such counterexamples is, at most, very weak evidence against their existence; we are forced to fall back on priors.
I get the impression that you have noticed the lack of observed counterexamples, and updated that counterexamples are rare, without noticing that you would also mostly not observe counterexamples even if they were common. (Though of course this is subject to the usual qualifiers about how it’s difficult to guess other peoples’ mental processes, you have better information than I about whether you indeed updated in such a way, etc.)
That said, if I were to actively look for such counterexamples in the context of software, the obfuscated C code competition would be one natural target.
We can also get indirect bits of evidence on the matter. For instance, we can look at jury trials, and notice that they are notoriously wildly unreliable in practice. That suggests that, relative to the cognition of a median-ish human, there must exist situations in which one lawyer can point out the problem in another’s logic/evidence, and the the median-ish human will not be able verify it. Now, one could argue that this is merely because median-ish humans are not very bright (a claim I’d agree with), but then it’s rather a large jump to claim that e.g. you or I is so smart that analogous problems are not common for us.
For instance, we can look at jury trials, and notice that they are notoriously wildly unreliable in practice. That suggests that, relative to the cognition of a median-ish human, there must exist situations in which one lawyer can point out the problem in another’s logic/evidence, and the the median-ish human will not be able verify it.
This is something of a tangent, but juries’ unreliability does not particularly suggest that conclusion to me. I immediately see three possible reasons for juries to be unreliable:
The courts may not reliably communicate to juries the criteria by which they are supposed to decide the case
The jurors may decide to ignore the official criteria and do something else instead
The jurors may know the official criteria and make a sincere attempt to follow them, but fail in some way
You’re supposing that the third reason dominates. I haven’t made a serious study of how juries work in practice, but my priors say the third reason is probably the least significant, so this is not very convincing to me.
(I also note that you’d need to claim that juries are inconsistent relative to the lawyers’ arguments, not merely inconsistent relative to the factual details of the case, and it’s not at all obvious to me that juries’ reputation for unreliability is actually controlled in that way.)
Conditional on such counterexamples existing, I would usually expect to not notice them. Even if someone displayed such a counterexample, it would presumably be quite difficult to verify that it is a counterexample. Therefore a lack of observation of such counterexamples is, at most, very weak evidence against their existence; we are forced to fall back on priors.
You can check whether there are examples where it takes an hour to notice a problem, or 10 hours, or 100 hours… You can check whether there are examples that require lots of expertise to evaluate. And so on. the question isn’t whether there is some kind of magical example that is literally impossible to notice, it’s whether there are cases where verification is hard relative to generation!
You can check whether you can generate examples, or whether other people believe that they can generate examples. The question is about whether a slightly superhuman AI can find examples, not whether they exist (and indeed whether they exist is more unfalsifiable, not because of the difficulty of recognizing them but because of the difficulty of finding them).
You can look for examples in domains where the ground truth is available. E.g. we can debate about the existence of bugs or vulnerabilities in software, and then ultimately settle the question by running the code and having someone demonstrate a vulnerability. If Alice claims something is a vulnerability but I can’t verify her reasoning, then she can still demonstrate that it was correct by going and attacking the system.
I’ve looked at e.g. some results from the underhanded C competition and they are relatively easy for laypeople to recognize in a short amount of time when the attack is pointed out. I have not seen examples of attacks that are hard to recognize as plausible attacks without significant expertise or time, and I am legitimately interested in them.
I’m bowing out here, you are welcome to the last word.
Conditional on such counterexamples existing, I would usually expect to not notice them. Even if someone displayed such a counterexample, it would presumably be quite difficult to verify that it is a counterexample. Therefore a lack of observation of such counterexamples is, at most, very weak evidence against their existence; we are forced to fall back on priors.
I get the impression that you have noticed the lack of observed counterexamples, and updated that counterexamples are rare, without noticing that you would also mostly not observe counterexamples even if they were common. (Though of course this is subject to the usual qualifiers about how it’s difficult to guess other peoples’ mental processes, you have better information than I about whether you indeed updated in such a way, etc.)
That said, if I were to actively look for such counterexamples in the context of software, the obfuscated C code competition would be one natural target.
We can also get indirect bits of evidence on the matter. For instance, we can look at jury trials, and notice that they are notoriously wildly unreliable in practice. That suggests that, relative to the cognition of a median-ish human, there must exist situations in which one lawyer can point out the problem in another’s logic/evidence, and the the median-ish human will not be able verify it. Now, one could argue that this is merely because median-ish humans are not very bright (a claim I’d agree with), but then it’s rather a large jump to claim that e.g. you or I is so smart that analogous problems are not common for us.
This is something of a tangent, but juries’ unreliability does not particularly suggest that conclusion to me. I immediately see three possible reasons for juries to be unreliable:
The courts may not reliably communicate to juries the criteria by which they are supposed to decide the case
The jurors may decide to ignore the official criteria and do something else instead
The jurors may know the official criteria and make a sincere attempt to follow them, but fail in some way
You’re supposing that the third reason dominates. I haven’t made a serious study of how juries work in practice, but my priors say the third reason is probably the least significant, so this is not very convincing to me.
(I also note that you’d need to claim that juries are inconsistent relative to the lawyers’ arguments, not merely inconsistent relative to the factual details of the case, and it’s not at all obvious to me that juries’ reputation for unreliability is actually controlled in that way.)
You can check whether there are examples where it takes an hour to notice a problem, or 10 hours, or 100 hours… You can check whether there are examples that require lots of expertise to evaluate. And so on. the question isn’t whether there is some kind of magical example that is literally impossible to notice, it’s whether there are cases where verification is hard relative to generation!
You can check whether you can generate examples, or whether other people believe that they can generate examples. The question is about whether a slightly superhuman AI can find examples, not whether they exist (and indeed whether they exist is more unfalsifiable, not because of the difficulty of recognizing them but because of the difficulty of finding them).
You can look for examples in domains where the ground truth is available. E.g. we can debate about the existence of bugs or vulnerabilities in software, and then ultimately settle the question by running the code and having someone demonstrate a vulnerability. If Alice claims something is a vulnerability but I can’t verify her reasoning, then she can still demonstrate that it was correct by going and attacking the system.
I’ve looked at e.g. some results from the underhanded C competition and they are relatively easy for laypeople to recognize in a short amount of time when the attack is pointed out. I have not seen examples of attacks that are hard to recognize as plausible attacks without significant expertise or time, and I am legitimately interested in them.
I’m bowing out here, you are welcome to the last word.