A very good question indeed. Although … there is a depressing answer.
This is a core-belief issue. For some people (like Yudkowsky and almost everyone in MIRI) artificial intelligence must be about the mathematics of artificial intelligence, but without the utility-function approach, that entire paradigm collapses. Seriously: it all comes down like a house of cards.
So, this is a textbook case of a Kuhn / Feyerabend—style clash of paradigms. It isn’t a matter of “Okay, so utility functions might not be the best approach: so let’s search for a better way to do it” …. it is more a matter of “Anyone who thinks that an AI cannot be built using utility functions is a crackpot.” It is a core belief in the sense that it is not allowed to be false. It is unthinkable, so rather than try to defend it, those who deny it have to be personally attacked. (I don’t say this because of personal experience, I say it because that kind of thing has been observed over and over when paradigms come into conflict).
Here, for example, is a message sent to the SL4 mailing list by Yudkowsky in August 2006:
Dear Richard Loosemore:
When someone doesn’t have anything concrete to say, of course they
always trot out the “paradigm” excuse.
Sincerely,
Eliezer Yudkowsky.
So the immediate answer to your question is that it will never be treated as a matter of urgency, it will be denied until all the deniers drop dead.
Meanwhile, I went beyond that problem and outlined a solution, soon after I started working in this field in the mid-80s. And by 2006 I had clarified my ideas enough to present them at the AGIRI workshop held in Bethesda that year. The MIRI (then called SIAI) crowd were there, along with a good number of other professional AI people.
The response was interesting. During my presentation the SIAI/MIRI bunch repeatedly interrupted with rude questions or pointed, very loud, laughter. Insulting laughter. Loud enough to make the other participants look over and wonder what the heck was going on.
That’s your answer, again, right there.
But if you want to know what to do about it, the paper I published after the workshop is a good place to start.
it is more a matter of “Anyone who thinks that an AI cannot be built using utility functions is a crackpot.” It is a core belief in the sense that it is not allowed to be false. It is unthinkable, so rather than try to defend it, those who deny it have to be personally attacked.
Won’t comment about past affairs, but these days at least part of MIRI seems more open to the possibility. E.g. this thread where So8res (Nate Soares, now Executive Director of MIRI) lists some possible reasons for why it might be necessary to move beyond utility functions. (He is pretty skeptical of most, but at least he seems to be seriously considering the possibility, and gives a ~15% chance “that VNM won’t cut it”.)
The day that I get invited as a guest speaker by either MIRI or FHI will mark the point at which they start to respect and take seriously alternative viewpoints.
If so, it seems to me to have rather little to do with the question of whether utility functions are necessary, helpful, neutral, unhelpful, or downright inconsistent with genuinely intelligent behaviour. It argues that intelligent minds may be “complex systems” whose behaviour is very difficult to relate to their lower-level mechanisms, but something that attempts to optimize a utility function can perfectly well have that property. (Because the utility function can itself be complex in the relevant sense; or because the world is complex, so that effective optimization of even a not-so-complex utility function turns out to be achievable only by complex systems; or because even though the utility function could be optimized by something not-complex, the particular optimizer we’re looking at happens to be complex.)
My understanding of the position of EY and other people at MIRI is not that “artificial intelligence must be about the mathematics of artificial intelligence”, but that if we want to make artificially intelligent systems that might be able to improve themselves rapidly, and if we want high confidence that this won’t lead to an outcome we’d view as disastrous, the least-useless tools we have are mathematical ones.
Surely it’s perfectly possible to hold (1) that extremely capable AI might be produced by highly non-mathematical means, but (2) that this would likely be disastrous for us, so that (3) we should think mathematically about AI in the hope of finding a way of doing it that doesn’t lead to disaster. But it looks as if you are citing their belief in #3 as indicating that they violently reject #1.
So, anyway, utility functions. The following things seem to be clearly true:
There are functions whose maximization implies (at least) kinda-intelligence-like behaviour. For instance, maximizing games of chess won against the world champion (in circumstances where you do actually have to play the games rather than, e.g., just killing him) requires you to be able to play chess at that level. Maximizing the profits of a company requires you to do something that resembles what the best human businesspeople do. Maximizing the number of people who regard you as a friend seems like it requires you to be good at something like ordinary social interaction. Etc.
Some of these things could probably be gamed. E.g., maybe there’s a way to make people regard you as a friend by drugging them or messing directly with their brains. If we pick difficult enough tasks, then gaming them effectively is also the kind of thing that is generally regarded as good evidence of intelligence.
The actually intelligent agents we know of (namely, ourselves and to a lesser extent animals and maybe some computer software) appear to have something a bit like utility functions. That is, we have preferences and to some extent we act so as to realize those preferences.
For real human beings in the real world, those preferences are far from being perfectly describable by any utility function. But it seems reasonable to me to describe them as being in some sense the same kind of thing as a utility function.
There are mathematical theorems that say that if you have preferences over outcomes, then certain kinda-reasonable assumptions (that can be handwavily described as “your preferences are consistent and sane”) imply that those preferences actually must be describable by a utility function.
This doesn’t mean that effective intelligent agents must literally have utility functions; after all, we are effective intelligent agents and we don’t. But it does at least suggest that if you’re trying to build an effective intelligent agent, then giving it a utility function isn’t obviously a bad idea.
All of which seems to me like sufficient reason to (1) investigate AI designs that have (at least approximately) utility functions, and (2) be skeptical of any claim that having a utility function actually makes AI impossible. And it doesn’t appear to me to come down to a baseless article of faith, no matter what you and Marcus Hutter may have said to one another.
My understanding of the position of EY and other people at MIRI is not that “artificial intelligence must be about the mathematics of artificial intelligence”, but that if we want to make artificially intelligent systems that might be able to improve themselves rapidly, and if we want high confidence that this won’t lead to an outcome we’d view as disastrous, the least-useless tools we have are mathematical ones.
But there are good reasons for thinking that, in absolute terms, many mathematical methods of AI safety are useless. The problem is that they relate to ideal rationaliists, but ideal rationality is uncomputable, so they are never directly applicable to any buildable AI....and how they real world AI would deviate from ideal rationality is crucial to understanding the that’s they would pose. Deviations from ideal rationality could pose new threats, or could counter certain classes of threat (in particular, lack of goal stability could be leveraged to provide corrigibility, which is a desirable safety feature).
Surely it’s perfectly possible to hold (1) that extremely capable AI might be produced by highly non-mathematical means, but (2) that this would likely be disastrous for us, so that (3) we should think mathematically about AI in the hope of finding a way of doing it that doesn’t lead to disaster. But it looks as if you are citing their belief in #3 as indicating that they violently reject #1.
There’s an important difference between thinking mathematically and only thinking mathematically. Highly non mathematical AI, that is cobbled together without clean overriding principles, cannot be made safe by clean mathematical principles, although it could quite conceivably be made safe by piecemeal engineering solutions such as kill switches, corrigibility and better boxing… the kind of solution MIRI isnt interested in...which does look as though they are neglecting a class of AI danger.
many mathematical methods of AI safety are useless
If any particular mathematical approach to AI safety is useless, and if MIRI are attempting to use that approach, then they are making a mistake. But we should distinguish that from a different situation where they aren’t attempting to use the useless approach but are studying it for insight. So, e.g., maybe approach X is only valid for AIs that are ideal rationalists, but they hope that some of what they discover by investigating approach X will point the way to useful approaches for not-so-ideal rationalists.
Do you have particular examples in mind? Is there good evidence telling us whether MIRI think the methods in question will be directly applicable to real AIs?
There’s an important difference between thinking mathematically and only thinking mathematically.
I agree. I am not so sure I agree that cobbled-together AI can “quite conceivably be made safe by piecemeal engineering solutions”, and I’m pretty sure that historically at least MIRI has thought it very unlikely that they can. It does seem plausible that any potentially-dangerous AI could be made at least a bit safer by such things, and I hope MIRI aren’t advocating that no such things be done. But this is all rather reminiscent of computer security, where there are crude piecemeal things you can do that help a bit, but if you want really tight security there’s no substitute for designing your system for security from the start—and one possible danger of doing the crude piecemeal things is that they give you a false sense of safety.
There’s an important difference between thinking mathematically and only thinking mathematically.
I agree. I am not so sure I agree that cobbled-together AI can “quite conceivably be made safe by piecemeal engineering solutions”, and I’m pretty sure that historically at least MIRI has thought it very unlikely that they can. It does seem plausible that any potentially-dangerous AI could be made at least a bit safer by such things, and I hope MIRI aren’t advocating that no such things be done.
By 1900, the basic principles of areodynamics in terms of lift and drag were known for almost a century—the basic math of flight. There were two remaining problems: power and control. Powered heavier than air flight requires an efficient engine with sufficient power/weight ratio. Combustion engine tech developed along a sigmoid, and by 1900 that tech was ready.
The remaining problem then was control. Most of the flight pioneers either didn’t understand the importance of this problem, or they thought that aircraft could be controlled like boats are—with a simple rudder mechanism. The Wright Brothers—two unknown engineers—realized that steering in 3D was more complex. They solved this problem by careful observation of bird flight. They saw that birds turned by banking their whole body (and thus leveraging the entire wing airfoil), induced through careful airfoil manipulation on the trailing wing edge. They copied this wing warping mechanism directly in their first flying machines. Of course—they weren’t the only ones to realize all this, and ailerons are functionally equivalent but more practical for fixed wing aircraft.
Flight was achieved by technological evolution or experimental engineering, taking some inspiration from biology. Pretty much all tech is created through steady experimental/evolutionary engineering. Machine learning is on a very similar track to produce AGI in the near term.
But this is all rather reminiscent of computer security, where there are crude piecemeal things you can do that help a bit, but if you want really tight security
Ahh and that’s part of the problem. The first AGIs will be sub-human then human level intelligence, and Moore’s Law is about to end or has already ended, so the risk of some super rapid SI explosion in the near term is low. Most of the world doesn’t care about tight security. AGI just needs to be as safe or safer than humans. Tight security is probably impossible regardless—you can’t prove tight bounds on any system of extreme complexity (like the real world). Tight math bounds always requires ultra-simplified models.
Where are insights about the relative usefulness of .pure theory going to come from?
I am not so sure I agree that cobbled-together AI can “quite conceivably be made safe by piecemeal engineering solutions”
Its not even conceivable? Even though auto motive safety basically happened that way?
but if you want really tight security there’s no substitute for designing your system for security from the start
That’s clearly not crude hackery, but its not pure theory either. The kind of Clean Engineering you are talking about ican only be specific to a particular architecture, which pure theory isnt.
There is a pretty hard limit to how much you can predict about system,, AI or not, without knowing its architecture.
Where are insights about the relative usefulness of pure theory going to come from?
That wasn’t at all the sort of insight I had in mind. It’s commonplace in science to start trying to understand complicated things by first considering simpler things. Then sometimes you learn techniques that turn out to be applicable in the harder case, or obstacles that are likely still to be there in the harder case.
(Lots of computer science research has considered computers with literally unlimited memory, models of computation in which a single operation can do arbitrary arithmetic on an integer of any size, models of computation in which the cost of accessing memory doesn’t depend on how big the memory is, etc., and still managed to produce things that end up being useful for actual software running on actual computers with finite memories and finite registers running in a universe with a finite maximum speed limit.)
It’s not even conceivable?
Well, I guess it depends on what you mean by “quite conceivable”. Obviously anyone can say “we might be able to make a cobbled-together AI safe by piecemeal engineering solutions”, so if that counts as “conceiving” then plainly it’s conceivable. But conceivability in that sense is (I think) completely uninteresting; what we should care about is whether it’s at all likely, and that’s what I took you to mean by “quite conceivable”.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
The kind of Clean Engineering you are talking about can only be specific to a particular architecture, which pure theory isn’t.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
hard limit to how much you can predict [...] without knowing its architecture
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
I’m sure that very detailed prediction of what a superhumanly intelligent AI will do will be impractical in many cases, even knowing its architecture. (Otherwise it wouldn’t be superhumanly intelligent.) That’s quite compatible with being able to say that it’s worth worrying that it might do X, and that so far as we can see there is no circumstance in which it would do Y. And predictions of that kind are definitely possible without any knowledge of the architecture. (“I worry that the chess program will attack my kingside if I leave it without good defenders.” “Whatever I do, it is very unlikely that the chess program will make an illegal move.”)
But, sure, lots of the things that would be necessary to build a safe superhuman AI would depend on the details of how it’s designed. That’s OK; no one is building a superhuman AI yet. (So far as we know.) The point of MIRI’s work, as I understand it, is to prepare the ground for later engineering work by beginning to understand the territory.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
Piecemeal efforts are least likely to make a difference to the most dangerous, least likely scenario of a fast takeoff singleton. But there is societal lesson to be learnt from things like automotive safety, and Nuclear non proliferation: voluntary self restraint can be a factor.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
Lessons about engineering can be learnt from engineering, too. For instance Big Design up Front, the standard response to the rapidly self improving singleton, is known to be a pretty terrible way of doing things, that should be avoided if there are alternatives.
Negative leasons from pure theory need to be learnt, too. MIRIs standard response to the tilings agents problem is that a way will be found around the problem of simultaneous value preservation and self modification. But why bother? If the Loebian obstacle is allowed to stand, there is no threat from a Clippie. That is a rather easily achieved form of self restraint. You probably have to gave up on the idea of a God AI benevolently ruling the world, but some of were never that .keen anyway.
Another negative lesson is that ideal rationalists are uncomputable, with the corollary that there is no one way to be a non ideal rationalist...which leads into architecture specific safety.
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
That can only be true in special cases. You can’t in general predict a chess programme that is better that you, because,iif you could, you would be as good as it is.
In any case, detailed prediction is beside the point. If you want to design architecture specific safety features, you need a broad view of how AIs of a class would behave.
Someone’s got to have insights about how pure theory fits into the bigger picture.
I wasn’t meaning to denigrate that sort of insight. (Though “how pure theory fits in” doesn’t seem to me the same thing as “the relative usefulness of pure theory”, which is what you said before, and I think what you’re describing now sounds distinctly more valuable.) Just saying that it wasn’t the kind of insight I would look for from studying the pure theory.
sometimes that’s directly applicable
In this case, I wouldn’t much expect it to be directly applicable. But I would expect it to be much easier to tell whether it is (and whether it’s indirectly applicable) once one has a reasonable quantity of theory in hand.
Meanwhile, I went beyond that problem and outlined a solution, soon after I started working in this field in the mid-80s. And by 2006 I had clarified my ideas enough to present them at the AGIRI workshop held in Bethesda that year.
Sorry, was in too much of a rush to give link.....
Loosemore, R.P.W. (2007). Complex Systems, Artificial Intelligence and Theoretical Psychology. In B. Goertzel & P. Wang (Eds.), Proceedings of the 2006 AGI Workshop. IOS Press, Amsterdam.
Excuse me, but as much as I think the SIAI bunch were being rude to you, if you had presented, at a serious conference on a serious topic, a paper that waves its hands, yells “Complexity! Irreducible! Parallel!” and expected a good reception, I would have been privately snarking if not publicly. That would be me acting like a straight-up asshole, but it would also be because you never try to understand a phenomenon by declaring it un-understandable. Which is not to say that symbolic, theorem-prover, “Pure Maths are Pure Reason which will create Pure Intelligence” approaches are very good either—they totally failed to predict that the brain is a universal learning machine, for instance.
(And so far, the “HEY NEURAL NETS LEARN WELL” approach is failing to predict a few things I think they really ought to be able to see, and endeavor to show.)
That anyone would ever try to claim a technological revolution is about to arise from either of those schools of work is what constantly discredits the field of artificial intelligence as a hype-driven fraud!
Okay, so I am trying to understand what you are attacking here, and I assume you mean my presentation of that paper at the 2007 AGIRI workshop.
Let me see: you reduced the entire paper to the statement that I yelled “Complexity! Irreducible! Parallel!”.
Hmmmm...… that sounds like you thoroughly understood the paper and read it in great detail, because you reflected back all the arguments in the paper, showed good understanding of the cognitive science, AI and complex-systems context, and gave me a thoughtful, insightful list of comments on some of the errors of reasoning that I made in the paper.
So I guess you are right. I am ignorant. I have not been doing research in cognitive psychology, AI and complex systems for 20 years (as of the date of that workshop). I have nothing to say to defend any of my ideas at all, when people make points about what is wrong in those ideas. And, worse still, I did not make any suggestions in that paper about how to solve the problem I described, except to say “HEY NEURAL NETS LEARN WELL”.
I wish you had been around when I wrote the paper, because I could have reduced the whole thing to one 3-word and one 5-word sentence, and saved a heck of a lot of time.
P.S. I will forward your note to the Santa Fe Institute and the New England Complex Systems Institute, so they can also understand that they are ignorant. I guess we can expect an unemployment spike in Santa Fe and Boston, next month, when they all resign en masse.
I don’t see it as dogmatism so much as a verbal confusion. The ubiquity of UFs can be defended using a broad (implicit) definition, but the conclusions typically drawn about types of AI danger and methods of AI safety relate to a narrower definition, where a Ufmks
Explicitly coded
And/or
Fixed, unupdateable
And/or
“Thick” containing detailed descriptions of goals.
A very good question indeed. Although … there is a depressing answer.
This is a core-belief issue. For some people (like Yudkowsky and almost everyone in MIRI) artificial intelligence must be about the mathematics of artificial intelligence, but without the utility-function approach, that entire paradigm collapses. Seriously: it all comes down like a house of cards.
So, this is a textbook case of a Kuhn / Feyerabend—style clash of paradigms. It isn’t a matter of “Okay, so utility functions might not be the best approach: so let’s search for a better way to do it” …. it is more a matter of “Anyone who thinks that an AI cannot be built using utility functions is a crackpot.” It is a core belief in the sense that it is not allowed to be false. It is unthinkable, so rather than try to defend it, those who deny it have to be personally attacked. (I don’t say this because of personal experience, I say it because that kind of thing has been observed over and over when paradigms come into conflict).
Here, for example, is a message sent to the SL4 mailing list by Yudkowsky in August 2006:
So the immediate answer to your question is that it will never be treated as a matter of urgency, it will be denied until all the deniers drop dead.
Meanwhile, I went beyond that problem and outlined a solution, soon after I started working in this field in the mid-80s. And by 2006 I had clarified my ideas enough to present them at the AGIRI workshop held in Bethesda that year. The MIRI (then called SIAI) crowd were there, along with a good number of other professional AI people.
The response was interesting. During my presentation the SIAI/MIRI bunch repeatedly interrupted with rude questions or pointed, very loud, laughter. Insulting laughter. Loud enough to make the other participants look over and wonder what the heck was going on.
That’s your answer, again, right there.
But if you want to know what to do about it, the paper I published after the workshop is a good place to start.
Won’t comment about past affairs, but these days at least part of MIRI seems more open to the possibility. E.g. this thread where So8res (Nate Soares, now Executive Director of MIRI) lists some possible reasons for why it might be necessary to move beyond utility functions. (He is pretty skeptical of most, but at least he seems to be seriously considering the possibility, and gives a ~15% chance “that VNM won’t cut it”.)
The day that I get invited as a guest speaker by either MIRI or FHI will mark the point at which they start to respect and take seriously alternative viewpoints.
Would that be this paper?
If so, it seems to me to have rather little to do with the question of whether utility functions are necessary, helpful, neutral, unhelpful, or downright inconsistent with genuinely intelligent behaviour. It argues that intelligent minds may be “complex systems” whose behaviour is very difficult to relate to their lower-level mechanisms, but something that attempts to optimize a utility function can perfectly well have that property. (Because the utility function can itself be complex in the relevant sense; or because the world is complex, so that effective optimization of even a not-so-complex utility function turns out to be achievable only by complex systems; or because even though the utility function could be optimized by something not-complex, the particular optimizer we’re looking at happens to be complex.)
My understanding of the position of EY and other people at MIRI is not that “artificial intelligence must be about the mathematics of artificial intelligence”, but that if we want to make artificially intelligent systems that might be able to improve themselves rapidly, and if we want high confidence that this won’t lead to an outcome we’d view as disastrous, the least-useless tools we have are mathematical ones.
Surely it’s perfectly possible to hold (1) that extremely capable AI might be produced by highly non-mathematical means, but (2) that this would likely be disastrous for us, so that (3) we should think mathematically about AI in the hope of finding a way of doing it that doesn’t lead to disaster. But it looks as if you are citing their belief in #3 as indicating that they violently reject #1.
So, anyway, utility functions. The following things seem to be clearly true:
There are functions whose maximization implies (at least) kinda-intelligence-like behaviour. For instance, maximizing games of chess won against the world champion (in circumstances where you do actually have to play the games rather than, e.g., just killing him) requires you to be able to play chess at that level. Maximizing the profits of a company requires you to do something that resembles what the best human businesspeople do. Maximizing the number of people who regard you as a friend seems like it requires you to be good at something like ordinary social interaction. Etc.
Some of these things could probably be gamed. E.g., maybe there’s a way to make people regard you as a friend by drugging them or messing directly with their brains. If we pick difficult enough tasks, then gaming them effectively is also the kind of thing that is generally regarded as good evidence of intelligence.
The actually intelligent agents we know of (namely, ourselves and to a lesser extent animals and maybe some computer software) appear to have something a bit like utility functions. That is, we have preferences and to some extent we act so as to realize those preferences.
For real human beings in the real world, those preferences are far from being perfectly describable by any utility function. But it seems reasonable to me to describe them as being in some sense the same kind of thing as a utility function.
There are mathematical theorems that say that if you have preferences over outcomes, then certain kinda-reasonable assumptions (that can be handwavily described as “your preferences are consistent and sane”) imply that those preferences actually must be describable by a utility function.
This doesn’t mean that effective intelligent agents must literally have utility functions; after all, we are effective intelligent agents and we don’t. But it does at least suggest that if you’re trying to build an effective intelligent agent, then giving it a utility function isn’t obviously a bad idea.
All of which seems to me like sufficient reason to (1) investigate AI designs that have (at least approximately) utility functions, and (2) be skeptical of any claim that having a utility function actually makes AI impossible. And it doesn’t appear to me to come down to a baseless article of faith, no matter what you and Marcus Hutter may have said to one another.
But there are good reasons for thinking that, in absolute terms, many mathematical methods of AI safety are useless. The problem is that they relate to ideal rationaliists, but ideal rationality is uncomputable, so they are never directly applicable to any buildable AI....and how they real world AI would deviate from ideal rationality is crucial to understanding the that’s they would pose. Deviations from ideal rationality could pose new threats, or could counter certain classes of threat (in particular, lack of goal stability could be leveraged to provide corrigibility, which is a desirable safety feature).
There’s an important difference between thinking mathematically and only thinking mathematically. Highly non mathematical AI, that is cobbled together without clean overriding principles, cannot be made safe by clean mathematical principles, although it could quite conceivably be made safe by piecemeal engineering solutions such as kill switches, corrigibility and better boxing… the kind of solution MIRI isnt interested in...which does look as though they are neglecting a class of AI danger.
If any particular mathematical approach to AI safety is useless, and if MIRI are attempting to use that approach, then they are making a mistake. But we should distinguish that from a different situation where they aren’t attempting to use the useless approach but are studying it for insight. So, e.g., maybe approach X is only valid for AIs that are ideal rationalists, but they hope that some of what they discover by investigating approach X will point the way to useful approaches for not-so-ideal rationalists.
Do you have particular examples in mind? Is there good evidence telling us whether MIRI think the methods in question will be directly applicable to real AIs?
I agree. I am not so sure I agree that cobbled-together AI can “quite conceivably be made safe by piecemeal engineering solutions”, and I’m pretty sure that historically at least MIRI has thought it very unlikely that they can. It does seem plausible that any potentially-dangerous AI could be made at least a bit safer by such things, and I hope MIRI aren’t advocating that no such things be done. But this is all rather reminiscent of computer security, where there are crude piecemeal things you can do that help a bit, but if you want really tight security there’s no substitute for designing your system for security from the start—and one possible danger of doing the crude piecemeal things is that they give you a false sense of safety.
By 1900, the basic principles of areodynamics in terms of lift and drag were known for almost a century—the basic math of flight. There were two remaining problems: power and control. Powered heavier than air flight requires an efficient engine with sufficient power/weight ratio. Combustion engine tech developed along a sigmoid, and by 1900 that tech was ready.
The remaining problem then was control. Most of the flight pioneers either didn’t understand the importance of this problem, or they thought that aircraft could be controlled like boats are—with a simple rudder mechanism. The Wright Brothers—two unknown engineers—realized that steering in 3D was more complex. They solved this problem by careful observation of bird flight. They saw that birds turned by banking their whole body (and thus leveraging the entire wing airfoil), induced through careful airfoil manipulation on the trailing wing edge. They copied this wing warping mechanism directly in their first flying machines. Of course—they weren’t the only ones to realize all this, and ailerons are functionally equivalent but more practical for fixed wing aircraft.
Flight was achieved by technological evolution or experimental engineering, taking some inspiration from biology. Pretty much all tech is created through steady experimental/evolutionary engineering. Machine learning is on a very similar track to produce AGI in the near term.
Ahh and that’s part of the problem. The first AGIs will be sub-human then human level intelligence, and Moore’s Law is about to end or has already ended, so the risk of some super rapid SI explosion in the near term is low. Most of the world doesn’t care about tight security. AGI just needs to be as safe or safer than humans. Tight security is probably impossible regardless—you can’t prove tight bounds on any system of extreme complexity (like the real world). Tight math bounds always requires ultra-simplified models.
Where are insights about the relative usefulness of .pure theory going to come from?
Its not even conceivable? Even though auto motive safety basically happened that way?
That’s clearly not crude hackery, but its not pure theory either. The kind of Clean Engineering you are talking about ican only be specific to a particular architecture, which pure theory isnt.
There is a pretty hard limit to how much you can predict about system,, AI or not, without knowing its architecture.
That wasn’t at all the sort of insight I had in mind. It’s commonplace in science to start trying to understand complicated things by first considering simpler things. Then sometimes you learn techniques that turn out to be applicable in the harder case, or obstacles that are likely still to be there in the harder case.
(Lots of computer science research has considered computers with literally unlimited memory, models of computation in which a single operation can do arbitrary arithmetic on an integer of any size, models of computation in which the cost of accessing memory doesn’t depend on how big the memory is, etc., and still managed to produce things that end up being useful for actual software running on actual computers with finite memories and finite registers running in a universe with a finite maximum speed limit.)
Well, I guess it depends on what you mean by “quite conceivable”. Obviously anyone can say “we might be able to make a cobbled-together AI safe by piecemeal engineering solutions”, so if that counts as “conceiving” then plainly it’s conceivable. But conceivability in that sense is (I think) completely uninteresting; what we should care about is whether it’s at all likely, and that’s what I took you to mean by “quite conceivable”.
It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different.
The point of the pure theory is to help figure out what kind of engineering is going to be needed.
If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its architecture; and I’m not sure what we’ve learned about brain architecture has made a big difference to how well we can predict human behaviour. It might some day.
I’m sure that very detailed prediction of what a superhumanly intelligent AI will do will be impractical in many cases, even knowing its architecture. (Otherwise it wouldn’t be superhumanly intelligent.) That’s quite compatible with being able to say that it’s worth worrying that it might do X, and that so far as we can see there is no circumstance in which it would do Y. And predictions of that kind are definitely possible without any knowledge of the architecture. (“I worry that the chess program will attack my kingside if I leave it without good defenders.” “Whatever I do, it is very unlikely that the chess program will make an illegal move.”)
But, sure, lots of the things that would be necessary to build a safe superhuman AI would depend on the details of how it’s designed. That’s OK; no one is building a superhuman AI yet. (So far as we know.) The point of MIRI’s work, as I understand it, is to prepare the ground for later engineering work by beginning to understand the territory.
Piecemeal efforts are least likely to make a difference to the most dangerous, least likely scenario of a fast takeoff singleton. But there is societal lesson to be learnt from things like automotive safety, and Nuclear non proliferation: voluntary self restraint can be a factor.
Lessons about engineering can be learnt from engineering, too. For instance Big Design up Front, the standard response to the rapidly self improving singleton, is known to be a pretty terrible way of doing things, that should be avoided if there are alternatives.
Negative leasons from pure theory need to be learnt, too. MIRIs standard response to the tilings agents problem is that a way will be found around the problem of simultaneous value preservation and self modification. But why bother? If the Loebian obstacle is allowed to stand, there is no threat from a Clippie. That is a rather easily achieved form of self restraint. You probably have to gave up on the idea of a God AI benevolently ruling the world, but some of were never that .keen anyway.
Another negative lesson is that ideal rationalists are uncomputable, with the corollary that there is no one way to be a non ideal rationalist...which leads into architecture specific safety.
That can only be true in special cases. You can’t in general predict a chess programme that is better that you, because,iif you could, you would be as good as it is.
In any case, detailed prediction is beside the point. If you want to design architecture specific safety features, you need a broad view of how AIs of a class would behave.
Someones got to have insights about how pure theory fits into the bigger picture.
And sometimes that’s directly applicable, and sometimes it isnt....that’s one of the big picture issues.
I wasn’t meaning to denigrate that sort of insight. (Though “how pure theory fits in” doesn’t seem to me the same thing as “the relative usefulness of pure theory”, which is what you said before, and I think what you’re describing now sounds distinctly more valuable.) Just saying that it wasn’t the kind of insight I would look for from studying the pure theory.
In this case, I wouldn’t much expect it to be directly applicable. But I would expect it to be much easier to tell whether it is (and whether it’s indirectly applicable) once one has a reasonable quantity of theory in hand.
Link?
Sorry, was in too much of a rush to give link.....
Loosemore, R.P.W. (2007). Complex Systems, Artificial Intelligence and Theoretical Psychology. In B. Goertzel & P. Wang (Eds.), Proceedings of the 2006 AGI Workshop. IOS Press, Amsterdam.
http://richardloosemore.com/docs/2007_ComplexSystems_rpwl.pdf
Excuse me, but as much as I think the SIAI bunch were being rude to you, if you had presented, at a serious conference on a serious topic, a paper that waves its hands, yells “Complexity! Irreducible! Parallel!” and expected a good reception, I would have been privately snarking if not publicly. That would be me acting like a straight-up asshole, but it would also be because you never try to understand a phenomenon by declaring it un-understandable. Which is not to say that symbolic, theorem-prover, “Pure Maths are Pure Reason which will create Pure Intelligence” approaches are very good either—they totally failed to predict that the brain is a universal learning machine, for instance.
(And so far, the “HEY NEURAL NETS LEARN WELL” approach is failing to predict a few things I think they really ought to be able to see, and endeavor to show.)
That anyone would ever try to claim a technological revolution is about to arise from either of those schools of work is what constantly discredits the field of artificial intelligence as a hype-driven fraud!
Okay, so I am trying to understand what you are attacking here, and I assume you mean my presentation of that paper at the 2007 AGIRI workshop.
Let me see: you reduced the entire paper to the statement that I yelled “Complexity! Irreducible! Parallel!”.
Hmmmm...… that sounds like you thoroughly understood the paper and read it in great detail, because you reflected back all the arguments in the paper, showed good understanding of the cognitive science, AI and complex-systems context, and gave me a thoughtful, insightful list of comments on some of the errors of reasoning that I made in the paper.
So I guess you are right. I am ignorant. I have not been doing research in cognitive psychology, AI and complex systems for 20 years (as of the date of that workshop). I have nothing to say to defend any of my ideas at all, when people make points about what is wrong in those ideas. And, worse still, I did not make any suggestions in that paper about how to solve the problem I described, except to say “HEY NEURAL NETS LEARN WELL”.
I wish you had been around when I wrote the paper, because I could have reduced the whole thing to one 3-word and one 5-word sentence, and saved a heck of a lot of time.
P.S. I will forward your note to the Santa Fe Institute and the New England Complex Systems Institute, so they can also understand that they are ignorant. I guess we can expect an unemployment spike in Santa Fe and Boston, next month, when they all resign en masse.
I don’t see it as dogmatism so much as a verbal confusion. The ubiquity of UFs can be defended using a broad (implicit) definition, but the conclusions typically drawn about types of AI danger and methods of AI safety relate to a narrower definition, where a Ufmks
Explicitly coded And/or
Fixed, unupdateable And/or
“Thick” containing detailed descriptions of goals.