Where are insights about the relative usefulness of pure theory going to come from?
That wasn’t at all the sort of insight I had in mind. It’s commonplace in science to start trying to understand complicated things by first considering simpler things. Then sometimes you learn techniques that turn out to be applicable in the harder case, or obstacles that are likely still to be there in the harder case.
(Lots of computer science research has considered computers with literally unlimited memory, models of computation in which a single operation can do arbitrary arithmetic on an integer of any size, models of computation in which the cost of accessing memory doesn’t depend on how big the memory is, etc., and still managed to produce things that end up being useful for actual software running on actual computers with finite memories and finite registers running in a universe with a finite maximum speed limit.)
It’s not even conceivable?
Well, I guess it depends on what you mean by “quite conceivable”. Obviously anyone can say “we might be able to make a cobbled-together AI safe by piecemeal engineering solutions”, so if that counts as “conceiving” then plainly it’s conceivable. But conceivability in that sense is (I think) completely uninteresting; what we should care about is whether it’s at all likely, and that’s what I took you to mean by “quite conceivable”.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
The kind of Clean Engineering you are talking about can only be specific to a particular architecture, which pure theory isn’t.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
hard limit to how much you can predict [...] without knowing its architecture
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
I’m sure that very detailed prediction of what a superhumanly intelligent AI will do will be impractical in many cases, even knowing its architecture. (Otherwise it wouldn’t be superhumanly intelligent.) That’s quite compatible with being able to say that it’s worth worrying that it might do X, and that so far as we can see there is no circumstance in which it would do Y. And predictions of that kind are definitely possible without any knowledge of the architecture. (“I worry that the chess program will attack my kingside if I leave it without good defenders.” “Whatever I do, it is very unlikely that the chess program will make an illegal move.”)
But, sure, lots of the things that would be necessary to build a safe superhuman AI would depend on the details of how it’s designed. That’s OK; no one is building a superhuman AI yet. (So far as we know.) The point of MIRI’s work, as I understand it, is to prepare the ground for later engineering work by beginning to understand the territory.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
Piecemeal efforts are least likely to make a difference to the most dangerous, least likely scenario of a fast takeoff singleton. But there is societal lesson to be learnt from things like automotive safety, and Nuclear non proliferation: voluntary self restraint can be a factor.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
Lessons about engineering can be learnt from engineering, too. For instance Big Design up Front, the standard response to the rapidly self improving singleton, is known to be a pretty terrible way of doing things, that should be avoided if there are alternatives.
Negative leasons from pure theory need to be learnt, too. MIRIs standard response to the tilings agents problem is that a way will be found around the problem of simultaneous value preservation and self modification. But why bother? If the Loebian obstacle is allowed to stand, there is no threat from a Clippie. That is a rather easily achieved form of self restraint. You probably have to gave up on the idea of a God AI benevolently ruling the world, but some of were never that .keen anyway.
Another negative lesson is that ideal rationalists are uncomputable, with the corollary that there is no one way to be a non ideal rationalist...which leads into architecture specific safety.
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
That can only be true in special cases. You can’t in general predict a chess programme that is better that you, because,iif you could, you would be as good as it is.
In any case, detailed prediction is beside the point. If you want to design architecture specific safety features, you need a broad view of how AIs of a class would behave.
Someone’s got to have insights about how pure theory fits into the bigger picture.
I wasn’t meaning to denigrate that sort of insight. (Though “how pure theory fits in” doesn’t seem to me the same thing as “the relative usefulness of pure theory”, which is what you said before, and I think what you’re describing now sounds distinctly more valuable.) Just saying that it wasn’t the kind of insight I would look for from studying the pure theory.
sometimes that’s directly applicable
In this case, I wouldn’t much expect it to be directly applicable. But I would expect it to be much easier to tell whether it is (and whether it’s indirectly applicable) once one has a reasonable quantity of theory in hand.
That wasn’t at all the sort of insight I had in mind. It’s commonplace in science to start trying to understand complicated things by first considering simpler things. Then sometimes you learn techniques that turn out to be applicable in the harder case, or obstacles that are likely still to be there in the harder case.
(Lots of computer science research has considered computers with literally unlimited memory, models of computation in which a single operation can do arbitrary arithmetic on an integer of any size, models of computation in which the cost of accessing memory doesn’t depend on how big the memory is, etc., and still managed to produce things that end up being useful for actual software running on actual computers with finite memories and finite registers running in a universe with a finite maximum speed limit.)
Well, I guess it depends on what you mean by “quite conceivable”. Obviously anyone can say “we might be able to make a cobbled-together AI safe by piecemeal engineering solutions”, so if that counts as “conceiving” then plainly it’s conceivable. But conceivability in that sense is (I think) completely uninteresting; what we should care about is whether it’s at all likely, and that’s what I took you to mean by “quite conceivable”.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
I’m sure that very detailed prediction of what a superhumanly intelligent AI will do will be impractical in many cases, even knowing its architecture. (Otherwise it wouldn’t be superhumanly intelligent.) That’s quite compatible with being able to say that it’s worth worrying that it might do X, and that so far as we can see there is no circumstance in which it would do Y. And predictions of that kind are definitely possible without any knowledge of the architecture. (“I worry that the chess program will attack my kingside if I leave it without good defenders.” “Whatever I do, it is very unlikely that the chess program will make an illegal move.”)
But, sure, lots of the things that would be necessary to build a safe superhuman AI would depend on the details of how it’s designed. That’s OK; no one is building a superhuman AI yet. (So far as we know.) The point of MIRI’s work, as I understand it, is to prepare the ground for later engineering work by beginning to understand the territory.
Piecemeal efforts are least likely to make a difference to the most dangerous, least likely scenario of a fast takeoff singleton. But there is societal lesson to be learnt from things like automotive safety, and Nuclear non proliferation: voluntary self restraint can be a factor.
Lessons about engineering can be learnt from engineering, too. For instance Big Design up Front, the standard response to the rapidly self improving singleton, is known to be a pretty terrible way of doing things, that should be avoided if there are alternatives.
Negative leasons from pure theory need to be learnt, too. MIRIs standard response to the tilings agents problem is that a way will be found around the problem of simultaneous value preservation and self modification. But why bother? If the Loebian obstacle is allowed to stand, there is no threat from a Clippie. That is a rather easily achieved form of self restraint. You probably have to gave up on the idea of a God AI benevolently ruling the world, but some of were never that .keen anyway.
Another negative lesson is that ideal rationalists are uncomputable, with the corollary that there is no one way to be a non ideal rationalist...which leads into architecture specific safety.
That can only be true in special cases. You can’t in general predict a chess programme that is better that you, because,iif you could, you would be as good as it is.
In any case, detailed prediction is beside the point. If you want to design architecture specific safety features, you need a broad view of how AIs of a class would behave.
Someones got to have insights about how pure theory fits into the bigger picture.
And sometimes that’s directly applicable, and sometimes it isnt....that’s one of the big picture issues.
I wasn’t meaning to denigrate that sort of insight. (Though “how pure theory fits in” doesn’t seem to me the same thing as “the relative usefulness of pure theory”, which is what you said before, and I think what you’re describing now sounds distinctly more valuable.) Just saying that it wasn’t the kind of insight I would look for from studying the pure theory.
In this case, I wouldn’t much expect it to be directly applicable. But I would expect it to be much easier to tell whether it is (and whether it’s indirectly applicable) once one has a reasonable quantity of theory in hand.