But in that case we just apply verification vs generation again. It’s extremely hard to tell if code has a security problem, but in practice it’s quite easy to verify a correct claim that code has a security problem. And that’s what’s relevant to AI delegation, since in fact we will be using AI systems to help oversee in this way.
I know you said that you’re not going to respond but in case you feel like giving a clarification I’d like to point out that I’m confused here.
Yes it usually easy to verify that a specific problem exists if the exact problem is pointed out to you[1].
But it’s much harder to verify claim that there are no problems, this code is doing exactly what you want.
And AFAIK staying in a loop:
1) AI tells us “here’s a specific problem”
2) We fix the problem then
3) Go back to step 1)
Doesn’t help with anything? We want to be in a state where AI says “This is doing exactly what you want” and we have reasons to trust that (and that is hard to verify).
EDIT to add: I think I didn’t make it clear enough what clarification I’m asking for.
Do you think it’s possible to use AI which will point out problems (but which we can’t trust when it says everything is ok) to “win”? It would be very interesting if you did and I’d love to learn more.
Do you think that we could trust AI when it says that everything is ok? Again that’d be very interesting.
Did I miss something? I’m curious to learn what but that’s just me being wrong (but that’s not new path to win interesting).
Also it’s possible that there are two problems, each problem is easy to fix on its own but it’s really hard to fix them both at the same time (simple example: it’s trivial to have 0 false positives or 0 false negatives when testing for a disease; it’s much harder to eliminate both at the same time).
[1] Well it can be hard to reliably reproduce problem, even if you know exactly what the problem is (I know because I couldn’t write e2e tests to verify some bug fixes).
I know you said that you’re not going to respond but in case you feel like giving a clarification I’d like to point out that I’m confused here.
Yes it usually easy to verify that a specific problem exists if the exact problem is pointed out to you[1].
But it’s much harder to verify claim that there are no problems, this code is doing exactly what you want.
And AFAIK staying in a loop:
1) AI tells us “here’s a specific problem”
2) We fix the problem then
3) Go back to step 1)
Doesn’t help with anything? We want to be in a state where AI says “This is doing exactly what you want” and we have reasons to trust that (and that is hard to verify).
EDIT to add: I think I didn’t make it clear enough what clarification I’m asking for.
Do you think it’s possible to use AI which will point out problems (but which we can’t trust when it says everything is ok) to “win”? It would be very interesting if you did and I’d love to learn more.
Do you think that we could trust AI when it says that everything is ok? Again that’d be very interesting.
Did I miss something? I’m curious to learn what but that’s just me being wrong (but that’s not new path to win interesting).
Also it’s possible that there are two problems, each problem is easy to fix on its own but it’s really hard to fix them both at the same time (simple example: it’s trivial to have 0 false positives or 0 false negatives when testing for a disease; it’s much harder to eliminate both at the same time).
[1] Well it can be hard to reliably reproduce problem, even if you know exactly what the problem is (I know because I couldn’t write e2e tests to verify some bug fixes).