The Waterfall approach to software development gave way to Agile. Space X’s fast iterations let it run rings around Boeing’s “we carefully design it once and then it will fly”.
This is fine for other fields, but the problem with superintelligent alignment is that the things in “move fast and break things” is, like, us. We only have one chance to get superhuman alignment right, which is why we have to design it carefully once. Misaligned systems will try to turn you off to make sure you can’t turn them off. I think Eliezer has even said that if we could simply revert the universe after each time we mess up alignment, he would be way less pessimistic.
Further, experiments with systems that are not capable enough to kill us yet can provide us with valuable information, but the problem is that much of the difficulty comes up around superintelligence levels. Things are going to break in weird ways that we couldn’t have anticipated just by extrapolating out the trends from current systems. So if we just wait for evidence that less robust solutions are not enough, then we will see that less robust solutions seem to work really well on weak current models, pat ourselves on the back for figuring out that actually alignment wouldn’t be that hard in X area and then as we approach superintelligence we start noticing X break down (if we’re lucky! if X is something like deception, it will try to hide from us and actively avoid being caught by our interpretability tools or whatever) and at that point it will be too late to try and fix the problem, because none of the technical or political solutions are viable in a very short time horizon.
Again, to be very clear, I’m not arguing that there is no use at all for empirical experiments today, it’s just that there are specific failure cases that are easy to fall into whenever you try to conclude something of the form “and therefore this is some amount of evidence that superintelligence will be more/less likely to be Y and Z”
I agree that we cannot be cavalier about it, but not experimenting is strictly worse than experimenting (not at the expense of theoretical work), because humans are bad at pure theory.
It is actually confirmed by this particular case. Special Relativity took some 50 years to form after Maxwell equations were written. General relativity took 500 years to be written down after Galileo’s experiment with equal acceleration of falling bodies. AND it took a once in a millennium genius to do that. (Twice, actually, Newton was the other one in physics.)
This is fine for other fields, but the problem with superintelligent alignment is that the things in “move fast and break things” is, like, us. We only have one chance to get superhuman alignment right, which is why we have to design it carefully once. Misaligned systems will try to turn you off to make sure you can’t turn them off. I think Eliezer has even said that if we could simply revert the universe after each time we mess up alignment, he would be way less pessimistic.
Further, experiments with systems that are not capable enough to kill us yet can provide us with valuable information, but the problem is that much of the difficulty comes up around superintelligence levels. Things are going to break in weird ways that we couldn’t have anticipated just by extrapolating out the trends from current systems. So if we just wait for evidence that less robust solutions are not enough, then we will see that less robust solutions seem to work really well on weak current models, pat ourselves on the back for figuring out that actually alignment wouldn’t be that hard in X area and then as we approach superintelligence we start noticing X break down (if we’re lucky! if X is something like deception, it will try to hide from us and actively avoid being caught by our interpretability tools or whatever) and at that point it will be too late to try and fix the problem, because none of the technical or political solutions are viable in a very short time horizon.
Again, to be very clear, I’m not arguing that there is no use at all for empirical experiments today, it’s just that there are specific failure cases that are easy to fall into whenever you try to conclude something of the form “and therefore this is some amount of evidence that superintelligence will be more/less likely to be Y and Z”
I agree that we cannot be cavalier about it, but not experimenting is strictly worse than experimenting (not at the expense of theoretical work), because humans are bad at pure theory.
The statement ‘humans are bad at pure theory’ seems to be clearly falsified by the extraordinary theoretical advances of the past, e.g. Einstein.
Whether theoretical or experimental approaches will prove most succesful for AI alignment is an open question.
It is actually confirmed by this particular case. Special Relativity took some 50 years to form after Maxwell equations were written. General relativity took 500 years to be written down after Galileo’s experiment with equal acceleration of falling bodies. AND it took a once in a millennium genius to do that. (Twice, actually, Newton was the other one in physics.)
This doesn’t look like a serious reply. I fail to see how the achievements of Newton, Maxwell, Einstein do not illustrate the power of theory.
I have nothing to add to my previous message, other than 500 years to come up with a theory is a long time.