My understanding of the position of EY and other people at MIRI is not that “artificial intelligence must be about the mathematics of artificial intelligence”, but that if we want to make artificially intelligent systems that might be able to improve themselves rapidly, and if we want high confidence that this won’t lead to an outcome we’d view as disastrous, the least-useless tools we have are mathematical ones.
But there are good reasons for thinking that, in absolute terms, many mathematical methods of AI safety are useless. The problem is that they relate to ideal rationaliists, but ideal rationality is uncomputable, so they are never directly applicable to any buildable AI....and how they real world AI would deviate from ideal rationality is crucial to understanding the that’s they would pose. Deviations from ideal rationality could pose new threats, or could counter certain classes of threat (in particular, lack of goal stability could be leveraged to provide corrigibility, which is a desirable safety feature).
Surely it’s perfectly possible to hold (1) that extremely capable AI might be produced by highly non-mathematical means, but (2) that this would likely be disastrous for us, so that (3) we should think mathematically about AI in the hope of finding a way of doing it that doesn’t lead to disaster. But it looks as if you are citing their belief in #3 as indicating that they violently reject #1.
There’s an important difference between thinking mathematically and only thinking mathematically. Highly non mathematical AI, that is cobbled together without clean overriding principles, cannot be made safe by clean mathematical principles, although it could quite conceivably be made safe by piecemeal engineering solutions such as kill switches, corrigibility and better boxing… the kind of solution MIRI isnt interested in...which does look as though they are neglecting a class of AI danger.
many mathematical methods of AI safety are useless
If any particular mathematical approach to AI safety is useless, and if MIRI are attempting to use that approach, then they are making a mistake. But we should distinguish that from a different situation where they aren’t attempting to use the useless approach but are studying it for insight. So, e.g., maybe approach X is only valid for AIs that are ideal rationalists, but they hope that some of what they discover by investigating approach X will point the way to useful approaches for not-so-ideal rationalists.
Do you have particular examples in mind? Is there good evidence telling us whether MIRI think the methods in question will be directly applicable to real AIs?
There’s an important difference between thinking mathematically and only thinking mathematically.
I agree. I am not so sure I agree that cobbled-together AI can “quite conceivably be made safe by piecemeal engineering solutions”, and I’m pretty sure that historically at least MIRI has thought it very unlikely that they can. It does seem plausible that any potentially-dangerous AI could be made at least a bit safer by such things, and I hope MIRI aren’t advocating that no such things be done. But this is all rather reminiscent of computer security, where there are crude piecemeal things you can do that help a bit, but if you want really tight security there’s no substitute for designing your system for security from the start—and one possible danger of doing the crude piecemeal things is that they give you a false sense of safety.
There’s an important difference between thinking mathematically and only thinking mathematically.
I agree. I am not so sure I agree that cobbled-together AI can “quite conceivably be made safe by piecemeal engineering solutions”, and I’m pretty sure that historically at least MIRI has thought it very unlikely that they can. It does seem plausible that any potentially-dangerous AI could be made at least a bit safer by such things, and I hope MIRI aren’t advocating that no such things be done.
By 1900, the basic principles of areodynamics in terms of lift and drag were known for almost a century—the basic math of flight. There were two remaining problems: power and control. Powered heavier than air flight requires an efficient engine with sufficient power/weight ratio. Combustion engine tech developed along a sigmoid, and by 1900 that tech was ready.
The remaining problem then was control. Most of the flight pioneers either didn’t understand the importance of this problem, or they thought that aircraft could be controlled like boats are—with a simple rudder mechanism. The Wright Brothers—two unknown engineers—realized that steering in 3D was more complex. They solved this problem by careful observation of bird flight. They saw that birds turned by banking their whole body (and thus leveraging the entire wing airfoil), induced through careful airfoil manipulation on the trailing wing edge. They copied this wing warping mechanism directly in their first flying machines. Of course—they weren’t the only ones to realize all this, and ailerons are functionally equivalent but more practical for fixed wing aircraft.
Flight was achieved by technological evolution or experimental engineering, taking some inspiration from biology. Pretty much all tech is created through steady experimental/evolutionary engineering. Machine learning is on a very similar track to produce AGI in the near term.
But this is all rather reminiscent of computer security, where there are crude piecemeal things you can do that help a bit, but if you want really tight security
Ahh and that’s part of the problem. The first AGIs will be sub-human then human level intelligence, and Moore’s Law is about to end or has already ended, so the risk of some super rapid SI explosion in the near term is low. Most of the world doesn’t care about tight security. AGI just needs to be as safe or safer than humans. Tight security is probably impossible regardless—you can’t prove tight bounds on any system of extreme complexity (like the real world). Tight math bounds always requires ultra-simplified models.
Where are insights about the relative usefulness of .pure theory going to come from?
I am not so sure I agree that cobbled-together AI can “quite conceivably be made safe by piecemeal engineering solutions”
Its not even conceivable? Even though auto motive safety basically happened that way?
but if you want really tight security there’s no substitute for designing your system for security from the start
That’s clearly not crude hackery, but its not pure theory either. The kind of Clean Engineering you are talking about ican only be specific to a particular architecture, which pure theory isnt.
There is a pretty hard limit to how much you can predict about system,, AI or not, without knowing its architecture.
Where are insights about the relative usefulness of pure theory going to come from?
That wasn’t at all the sort of insight I had in mind. It’s commonplace in science to start trying to understand complicated things by first considering simpler things. Then sometimes you learn techniques that turn out to be applicable in the harder case, or obstacles that are likely still to be there in the harder case.
(Lots of computer science research has considered computers with literally unlimited memory, models of computation in which a single operation can do arbitrary arithmetic on an integer of any size, models of computation in which the cost of accessing memory doesn’t depend on how big the memory is, etc., and still managed to produce things that end up being useful for actual software running on actual computers with finite memories and finite registers running in a universe with a finite maximum speed limit.)
It’s not even conceivable?
Well, I guess it depends on what you mean by “quite conceivable”. Obviously anyone can say “we might be able to make a cobbled-together AI safe by piecemeal engineering solutions”, so if that counts as “conceiving” then plainly it’s conceivable. But conceivability in that sense is (I think) completely uninteresting; what we should care about is whether it’s at all likely, and that’s what I took you to mean by “quite conceivable”.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
The kind of Clean Engineering you are talking about can only be specific to a particular architecture, which pure theory isn’t.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
hard limit to how much you can predict [...] without knowing its architecture
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
I’m sure that very detailed prediction of what a superhumanly intelligent AI will do will be impractical in many cases, even knowing its architecture. (Otherwise it wouldn’t be superhumanly intelligent.) That’s quite compatible with being able to say that it’s worth worrying that it might do X, and that so far as we can see there is no circumstance in which it would do Y. And predictions of that kind are definitely possible without any knowledge of the architecture. (“I worry that the chess program will attack my kingside if I leave it without good defenders.” “Whatever I do, it is very unlikely that the chess program will make an illegal move.”)
But, sure, lots of the things that would be necessary to build a safe superhuman AI would depend on the details of how it’s designed. That’s OK; no one is building a superhuman AI yet. (So far as we know.) The point of MIRI’s work, as I understand it, is to prepare the ground for later engineering work by beginning to understand the territory.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
Piecemeal efforts are least likely to make a difference to the most dangerous, least likely scenario of a fast takeoff singleton. But there is societal lesson to be learnt from things like automotive safety, and Nuclear non proliferation: voluntary self restraint can be a factor.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
Lessons about engineering can be learnt from engineering, too. For instance Big Design up Front, the standard response to the rapidly self improving singleton, is known to be a pretty terrible way of doing things, that should be avoided if there are alternatives.
Negative leasons from pure theory need to be learnt, too. MIRIs standard response to the tilings agents problem is that a way will be found around the problem of simultaneous value preservation and self modification. But why bother? If the Loebian obstacle is allowed to stand, there is no threat from a Clippie. That is a rather easily achieved form of self restraint. You probably have to gave up on the idea of a God AI benevolently ruling the world, but some of were never that .keen anyway.
Another negative lesson is that ideal rationalists are uncomputable, with the corollary that there is no one way to be a non ideal rationalist...which leads into architecture specific safety.
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
That can only be true in special cases. You can’t in general predict a chess programme that is better that you, because,iif you could, you would be as good as it is.
In any case, detailed prediction is beside the point. If you want to design architecture specific safety features, you need a broad view of how AIs of a class would behave.
Someone’s got to have insights about how pure theory fits into the bigger picture.
I wasn’t meaning to denigrate that sort of insight. (Though “how pure theory fits in” doesn’t seem to me the same thing as “the relative usefulness of pure theory”, which is what you said before, and I think what you’re describing now sounds distinctly more valuable.) Just saying that it wasn’t the kind of insight I would look for from studying the pure theory.
sometimes that’s directly applicable
In this case, I wouldn’t much expect it to be directly applicable. But I would expect it to be much easier to tell whether it is (and whether it’s indirectly applicable) once one has a reasonable quantity of theory in hand.
But there are good reasons for thinking that, in absolute terms, many mathematical methods of AI safety are useless. The problem is that they relate to ideal rationaliists, but ideal rationality is uncomputable, so they are never directly applicable to any buildable AI....and how they real world AI would deviate from ideal rationality is crucial to understanding the that’s they would pose. Deviations from ideal rationality could pose new threats, or could counter certain classes of threat (in particular, lack of goal stability could be leveraged to provide corrigibility, which is a desirable safety feature).
There’s an important difference between thinking mathematically and only thinking mathematically. Highly non mathematical AI, that is cobbled together without clean overriding principles, cannot be made safe by clean mathematical principles, although it could quite conceivably be made safe by piecemeal engineering solutions such as kill switches, corrigibility and better boxing… the kind of solution MIRI isnt interested in...which does look as though they are neglecting a class of AI danger.
If any particular mathematical approach to AI safety is useless, and if MIRI are attempting to use that approach, then they are making a mistake. But we should distinguish that from a different situation where they aren’t attempting to use the useless approach but are studying it for insight. So, e.g., maybe approach X is only valid for AIs that are ideal rationalists, but they hope that some of what they discover by investigating approach X will point the way to useful approaches for not-so-ideal rationalists.
Do you have particular examples in mind? Is there good evidence telling us whether MIRI think the methods in question will be directly applicable to real AIs?
I agree. I am not so sure I agree that cobbled-together AI can “quite conceivably be made safe by piecemeal engineering solutions”, and I’m pretty sure that historically at least MIRI has thought it very unlikely that they can. It does seem plausible that any potentially-dangerous AI could be made at least a bit safer by such things, and I hope MIRI aren’t advocating that no such things be done. But this is all rather reminiscent of computer security, where there are crude piecemeal things you can do that help a bit, but if you want really tight security there’s no substitute for designing your system for security from the start—and one possible danger of doing the crude piecemeal things is that they give you a false sense of safety.
By 1900, the basic principles of areodynamics in terms of lift and drag were known for almost a century—the basic math of flight. There were two remaining problems: power and control. Powered heavier than air flight requires an efficient engine with sufficient power/weight ratio. Combustion engine tech developed along a sigmoid, and by 1900 that tech was ready.
The remaining problem then was control. Most of the flight pioneers either didn’t understand the importance of this problem, or they thought that aircraft could be controlled like boats are—with a simple rudder mechanism. The Wright Brothers—two unknown engineers—realized that steering in 3D was more complex. They solved this problem by careful observation of bird flight. They saw that birds turned by banking their whole body (and thus leveraging the entire wing airfoil), induced through careful airfoil manipulation on the trailing wing edge. They copied this wing warping mechanism directly in their first flying machines. Of course—they weren’t the only ones to realize all this, and ailerons are functionally equivalent but more practical for fixed wing aircraft.
Flight was achieved by technological evolution or experimental engineering, taking some inspiration from biology. Pretty much all tech is created through steady experimental/evolutionary engineering. Machine learning is on a very similar track to produce AGI in the near term.
Ahh and that’s part of the problem. The first AGIs will be sub-human then human level intelligence, and Moore’s Law is about to end or has already ended, so the risk of some super rapid SI explosion in the near term is low. Most of the world doesn’t care about tight security. AGI just needs to be as safe or safer than humans. Tight security is probably impossible regardless—you can’t prove tight bounds on any system of extreme complexity (like the real world). Tight math bounds always requires ultra-simplified models.
Where are insights about the relative usefulness of .pure theory going to come from?
Its not even conceivable? Even though auto motive safety basically happened that way?
That’s clearly not crude hackery, but its not pure theory either. The kind of Clean Engineering you are talking about ican only be specific to a particular architecture, which pure theory isnt.
There is a pretty hard limit to how much you can predict about system,, AI or not, without knowing its architecture.
That wasn’t at all the sort of insight I had in mind. It’s commonplace in science to start trying to understand complicated things by first considering simpler things. Then sometimes you learn techniques that turn out to be applicable in the harder case, or obstacles that are likely still to be there in the harder case.
(Lots of computer science research has considered computers with literally unlimited memory, models of computation in which a single operation can do arbitrary arithmetic on an integer of any size, models of computation in which the cost of accessing memory doesn’t depend on how big the memory is, etc., and still managed to produce things that end up being useful for actual software running on actual computers with finite memories and finite registers running in a universe with a finite maximum speed limit.)
Well, I guess it depends on what you mean by “quite conceivable”. Obviously anyone can say “we might be able to make a cobbled-together AI safe by piecemeal engineering solutions”, so if that counts as “conceiving” then plainly it’s conceivable. But conceivability in that sense is (I think) completely uninteresting; what we should care about is whether it’s at all likely, and that’s what I took you to mean by “quite conceivable”.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
I’m sure that very detailed prediction of what a superhumanly intelligent AI will do will be impractical in many cases, even knowing its architecture. (Otherwise it wouldn’t be superhumanly intelligent.) That’s quite compatible with being able to say that it’s worth worrying that it might do X, and that so far as we can see there is no circumstance in which it would do Y. And predictions of that kind are definitely possible without any knowledge of the architecture. (“I worry that the chess program will attack my kingside if I leave it without good defenders.” “Whatever I do, it is very unlikely that the chess program will make an illegal move.”)
But, sure, lots of the things that would be necessary to build a safe superhuman AI would depend on the details of how it’s designed. That’s OK; no one is building a superhuman AI yet. (So far as we know.) The point of MIRI’s work, as I understand it, is to prepare the ground for later engineering work by beginning to understand the territory.
Piecemeal efforts are least likely to make a difference to the most dangerous, least likely scenario of a fast takeoff singleton. But there is societal lesson to be learnt from things like automotive safety, and Nuclear non proliferation: voluntary self restraint can be a factor.
Lessons about engineering can be learnt from engineering, too. For instance Big Design up Front, the standard response to the rapidly self improving singleton, is known to be a pretty terrible way of doing things, that should be avoided if there are alternatives.
Negative leasons from pure theory need to be learnt, too. MIRIs standard response to the tilings agents problem is that a way will be found around the problem of simultaneous value preservation and self modification. But why bother? If the Loebian obstacle is allowed to stand, there is no threat from a Clippie. That is a rather easily achieved form of self restraint. You probably have to gave up on the idea of a God AI benevolently ruling the world, but some of were never that .keen anyway.
Another negative lesson is that ideal rationalists are uncomputable, with the corollary that there is no one way to be a non ideal rationalist...which leads into architecture specific safety.
That can only be true in special cases. You can’t in general predict a chess programme that is better that you, because,iif you could, you would be as good as it is.
In any case, detailed prediction is beside the point. If you want to design architecture specific safety features, you need a broad view of how AIs of a class would behave.
Someones got to have insights about how pure theory fits into the bigger picture.
And sometimes that’s directly applicable, and sometimes it isnt....that’s one of the big picture issues.
I wasn’t meaning to denigrate that sort of insight. (Though “how pure theory fits in” doesn’t seem to me the same thing as “the relative usefulness of pure theory”, which is what you said before, and I think what you’re describing now sounds distinctly more valuable.) Just saying that it wasn’t the kind of insight I would look for from studying the pure theory.
In this case, I wouldn’t much expect it to be directly applicable. But I would expect it to be much easier to tell whether it is (and whether it’s indirectly applicable) once one has a reasonable quantity of theory in hand.