Eliezer seems to argue that humans couldn’t verify pivotal acts proposed by AI systems (e.g. contributions to alignment research), and that this further makes it difficult to safely perform pivotal acts. In addition to disliking his concept of pivotal acts, I think that this claim is probably wrong and clearly overconfident. I think it doesn’t match well with pragmatic experience in R&D in almost any domain, where verification is much, much easier than generation in virtually every domain.
I, personally, would like 5 or 10 examples, from disparate fields, of verification being easier than generation.
I’m just going to name random examples of fields, I think it’s true essentially all the time but I only have personal experience in a small number of domains where I’ve actually worked:
It’s easier to recognize a good paper in computer science or ML than to write one. I’m most familiar with theoretical computer science, where this is equally true in domains that are not yet formalized, e.g. a mediocre person in the field is still able to recognize important new conceptual ideas without being able to generate them. In ML it requires more data than is typically present in a paper (but e.g. can be obtained by independent replications or by being able to inspect code).
Verifying that someone has done a good job writing software is easier than writing it yourself, if you are able to e.g. interact with the software, get clear explanations of what they did and why, and have them also write good tests.
Verifying a theory in physics is easier than generating it. Both in the sense that it’s much easier to verify that QM or the standard model or general relativity is a good explanation of existing phenomena than it is to come up with those models from scratch, and in the sense that e.g. verifying claims about how the LHC supports a given claim is easier than designing and building the LHC.
Verifying that someone has built a good GPU or a quantum computer is much easier than building one. This is completely clear if you are able to perform experiments on the computer. I also think it’s almost certainly true if you are trying to evaluate a design and manufacturing process though I have less firsthand experience
There are a ton of fuzzy domains where we have less objective evidence but the claim seems obviously true to me. Evaluating papers in philosophy, useful exercises in futurism, alignment ideas, etc. all seem meaningfully easier than generating them (particularly if we require them to come with convincing justification). I think other people have different intuitions here but I’m not sure how to engage and if there are disagreements about more established fields that’s obviously nicer to use as an example.
This feels like stepping on a rubber duck while tip-toeing around sleeping giants but:
Don’t these analogies break if/when the complexity of the thing to generate/verify gets high enough? That is, unless you think the difficulty of verification of arbitrarily complex plans/ideas is asymptotic to some human-or-lower level of verification capability (which I doubt you do) then at some point humans can’t even verify the complex plan.
So, the deeper question just seems to be takeoff speeds again: If takeoff is too fast, we don’t have enough time to use “weak” AGI to help produce actually verifiable plans which solve alignment. If takeoff is slow enough, we might. (And if takeoff is too fast, we might not notice that we’ve passed the point of human verifiability until it’s too late.)
(I am consciously not bringing up ideas about HCH / other oversight-amplification ideas because I’m new to the scene and don’t feel familiar enough with them.)
I expect there will probably be a whole debate on this at some point, but as counterexamples I would give basically all the examples in When Money is Abundant, Knowledge is the Real Wealth and What Money Cannot Buy. The basic idea in both of these is that expertise, in most fields, is not easier to verify than to generate, because most of the difficulty is in figuring out what questions to ask and what to pay attention to, which itself require expertise.
More generally, I expect that verification is not much easier than generation in any domain where figuring out what questions to ask and what to pay attention to is itself the bulk of the problem. Unfortunately, this is very highly correlated with illegibility, so legible examples are rare.
It’s not obvious to me that the class of counter-examples “expertise, in most fields, is not easier to verify than to generate” are actually counter-examples. For example for “if you’re not a hacker, you can’t tell who the good hackers are,” it still seems like it would be easier to verify whether a particular hack will work than to come up with it yourself, starting off without any hacking expertise.
First, “does the hack work?” is not the only relevant question. A good hacker knows that other things also matter—e.g. how easy the code is for another person to understand, or how easy it is to modify later on. This principle generalizes: part of why expertise is hard-to-recognize is because non-experts won’t realize which questions to ask.
Second, checking whether a program does what we intend in general (i.e. making sure it has no bugs) is not consistently easier than writing a correct program oneself, especially if the program we’re trying to check is written by a not-very-good programmer. This is the fundamental reason why nobody uses formal verification methods: writing the specification for what-we-want-the-code-to-do is usually about as difficult, in practice, as writing the code to do it. (This is actually a separate argument/line-of-evidence that verification is not, in practice and in general, easier than generation.)
One particularly difficult case is when the thing you’re trying to verify has a subtle flaw.
Consider Kempe’s proof of the four colour theorem, which was generally accepted for eleven years before being refuted. (It is in fact a proof of the five-colour theorem)
And of course, subtle flaws are much more likely in things that someone has designed to deceive you.
Against an intelligent adversary, verification might be much harder than generation. I’d cite Marx and Freud as world-sweeping obviously correct theories that eventually turned out to be completely worthless. I can remember a time when both were taken very seriously in academic circles.
Exactly. You can’t generalize from “natural” examples to adversarial examples. If someone is trying hard to lie to you about something, verifying what they say can very well be harder than finding the truth would have been absent their input, particularly when you don’t know if and what they want to lie about.
I’m not an expert in any of these and I’d welcome correction, but I’d expect verification to be at least as hard as “doing the thing yourself” in cases like espionage, hacking, fraud and corruption.
The entire P vs NP problem basically boils down to ” is it easier to verify the correct answer than generate it?” And while it’s still unproven, in our universe the answer seems to be yes. So conditioning on P not equaling NP, it’s much easier to verify that it’s correct than to generate a proof or hypothesis.
Actually, my more specific question is “is verification still easier than generation, if the generation is adversarial?” That seems like a much more specific problem space than just “generation and verification in general.”
What kind of example are you looking for / what does your question mean?
I think if someone just tries their hardest to make “something that people will think is useful ML hardware” they will typically end up making useful ML hardware. I think this is most obvious for humans and human firms, but also very probably true for alien intelligences with quite different ability profiles.
I’m not sure if that’s what you mean by “adversarial” (it seems like it’s usually the relevant question), and if so I’m not sure how/whether it differs from the examples I gave.
I think if someone tries their hardest to make “something that people will think is useful ML hardware but isn’t,” I’m sure that’s also possible (though apparently much harder than just making useful ML hardware). Though on the flip side if someone then said “Recognize an argument that this hardware isn’t actually useful” I think that’s also much easier than generating the deceptive hardware itself.
(That discussion seems the same for my other 4 examples. If someone tries their hardest to produce “something that looks like a really great scientific theory” or “something that looks like a ground-breaking paper in TCS after careful evaluation” or whatever, you will get something that has a good probability of being a great scientific theory or a ground-breaking paper.)
It’s vastly easier to understand a maths proof (almost any maths proof) than it is to invent one.
It’s a lot easier to verify a solution to a problem in NP than it is to generate one (by definition!, but a lot of problems turn out to be NP-complete)
It’s a lot easier to check that someone caught a cricket ball than it is to catch one.
It’s a lot easier to check that someone can drive than to teach them.
It’s a lot easier to tell whether a program can tell the difference between cats and dogs than to write a program that can.
Counterexamples:
It can be a easier to write a correct computer program than to verify it, and easier to fix the bugs than to find them.
It can be easier to find an algorithm than to prove that it works.
I, personally, would like 5 or 10 examples, from disparate fields, of verification being easier than generation.
And also counterexamples, if anyone has any.
I’m just going to name random examples of fields, I think it’s true essentially all the time but I only have personal experience in a small number of domains where I’ve actually worked:
It’s easier to recognize a good paper in computer science or ML than to write one. I’m most familiar with theoretical computer science, where this is equally true in domains that are not yet formalized, e.g. a mediocre person in the field is still able to recognize important new conceptual ideas without being able to generate them. In ML it requires more data than is typically present in a paper (but e.g. can be obtained by independent replications or by being able to inspect code).
Verifying that someone has done a good job writing software is easier than writing it yourself, if you are able to e.g. interact with the software, get clear explanations of what they did and why, and have them also write good tests.
Verifying a theory in physics is easier than generating it. Both in the sense that it’s much easier to verify that QM or the standard model or general relativity is a good explanation of existing phenomena than it is to come up with those models from scratch, and in the sense that e.g. verifying claims about how the LHC supports a given claim is easier than designing and building the LHC.
Verifying that someone has built a good GPU or a quantum computer is much easier than building one. This is completely clear if you are able to perform experiments on the computer. I also think it’s almost certainly true if you are trying to evaluate a design and manufacturing process though I have less firsthand experience
There are a ton of fuzzy domains where we have less objective evidence but the claim seems obviously true to me. Evaluating papers in philosophy, useful exercises in futurism, alignment ideas, etc. all seem meaningfully easier than generating them (particularly if we require them to come with convincing justification). I think other people have different intuitions here but I’m not sure how to engage and if there are disagreements about more established fields that’s obviously nicer to use as an example.
This feels like stepping on a rubber duck while tip-toeing around sleeping giants but:
Don’t these analogies break if/when the complexity of the thing to generate/verify gets high enough? That is, unless you think the difficulty of verification of arbitrarily complex plans/ideas is asymptotic to some human-or-lower level of verification capability (which I doubt you do) then at some point humans can’t even verify the complex plan.
So, the deeper question just seems to be takeoff speeds again: If takeoff is too fast, we don’t have enough time to use “weak” AGI to help produce actually verifiable plans which solve alignment. If takeoff is slow enough, we might. (And if takeoff is too fast, we might not notice that we’ve passed the point of human verifiability until it’s too late.)
(I am consciously not bringing up ideas about HCH / other oversight-amplification ideas because I’m new to the scene and don’t feel familiar enough with them.)
I expect there will probably be a whole debate on this at some point, but as counterexamples I would give basically all the examples in When Money is Abundant, Knowledge is the Real Wealth and What Money Cannot Buy. The basic idea in both of these is that expertise, in most fields, is not easier to verify than to generate, because most of the difficulty is in figuring out what questions to ask and what to pay attention to, which itself require expertise.
More generally, I expect that verification is not much easier than generation in any domain where figuring out what questions to ask and what to pay attention to is itself the bulk of the problem. Unfortunately, this is very highly correlated with illegibility, so legible examples are rare.
It’s not obvious to me that the class of counter-examples “expertise, in most fields, is not easier to verify than to generate” are actually counter-examples. For example for “if you’re not a hacker, you can’t tell who the good hackers are,” it still seems like it would be easier to verify whether a particular hack will work than to come up with it yourself, starting off without any hacking expertise.
First, “does the hack work?” is not the only relevant question. A good hacker knows that other things also matter—e.g. how easy the code is for another person to understand, or how easy it is to modify later on. This principle generalizes: part of why expertise is hard-to-recognize is because non-experts won’t realize which questions to ask.
Second, checking whether a program does what we intend in general (i.e. making sure it has no bugs) is not consistently easier than writing a correct program oneself, especially if the program we’re trying to check is written by a not-very-good programmer. This is the fundamental reason why nobody uses formal verification methods: writing the specification for what-we-want-the-code-to-do is usually about as difficult, in practice, as writing the code to do it. (This is actually a separate argument/line-of-evidence that verification is not, in practice and in general, easier than generation.)
One particularly difficult case is when the thing you’re trying to verify has a subtle flaw.
Consider Kempe’s proof of the four colour theorem, which was generally accepted for eleven years before being refuted. (It is in fact a proof of the five-colour theorem)
And of course, subtle flaws are much more likely in things that someone has designed to deceive you.
Against an intelligent adversary, verification might be much harder than generation. I’d cite Marx and Freud as world-sweeping obviously correct theories that eventually turned out to be completely worthless. I can remember a time when both were taken very seriously in academic circles.
Exactly. You can’t generalize from “natural” examples to adversarial examples. If someone is trying hard to lie to you about something, verifying what they say can very well be harder than finding the truth would have been absent their input, particularly when you don’t know if and what they want to lie about.
I’m not an expert in any of these and I’d welcome correction, but I’d expect verification to be at least as hard as “doing the thing yourself” in cases like espionage, hacking, fraud and corruption.
The entire P vs NP problem basically boils down to ” is it easier to verify the correct answer than generate it?” And while it’s still unproven, in our universe the answer seems to be yes. So conditioning on P not equaling NP, it’s much easier to verify that it’s correct than to generate a proof or hypothesis.
But specific P problems can still be ‘too hard’ to solve practically.
Actually, my more specific question is “is verification still easier than generation, if the generation is adversarial?” That seems like a much more specific problem space than just “generation and verification in general.”
What kind of example are you looking for / what does your question mean?
I think if someone just tries their hardest to make “something that people will think is useful ML hardware” they will typically end up making useful ML hardware. I think this is most obvious for humans and human firms, but also very probably true for alien intelligences with quite different ability profiles.
I’m not sure if that’s what you mean by “adversarial” (it seems like it’s usually the relevant question), and if so I’m not sure how/whether it differs from the examples I gave.
I think if someone tries their hardest to make “something that people will think is useful ML hardware but isn’t,” I’m sure that’s also possible (though apparently much harder than just making useful ML hardware). Though on the flip side if someone then said “Recognize an argument that this hardware isn’t actually useful” I think that’s also much easier than generating the deceptive hardware itself.
(That discussion seems the same for my other 4 examples. If someone tries their hardest to produce “something that looks like a really great scientific theory” or “something that looks like a ground-breaking paper in TCS after careful evaluation” or whatever, you will get something that has a good probability of being a great scientific theory or a ground-breaking paper.)
It’s vastly easier to understand a maths proof (almost any maths proof) than it is to invent one.
It’s a lot easier to verify a solution to a problem in NP than it is to generate one (by definition!, but a lot of problems turn out to be NP-complete)
It’s a lot easier to check that someone caught a cricket ball than it is to catch one.
It’s a lot easier to check that someone can drive than to teach them.
It’s a lot easier to tell whether a program can tell the difference between cats and dogs than to write a program that can.
Counterexamples:
It can be a easier to write a correct computer program than to verify it, and easier to fix the bugs than to find them.
It can be easier to find an algorithm than to prove that it works.
I agree that “verification is much, much easier than generation”.
But I don’t agree that verification is generally ‘easy enough’.
I am surprised noone has mentioned P Vs NP and its myriad incarnations yet.