Not so much a specific misconception, but understanding the current state of AI research and understanding how mechanical most AI is (even if the mechanisms are impressive) should make you realize that being a “Friendly AI researcher” is a bit like being a unicorn tamer (and I mean that in a nice way—surely some enterprising genetic engineer will someday make unicorns).
Edit: Maybe I was being a little snarky—my meaning is simply this: Given how little we know about what actual Strong AI will look like (And we genuinely know very very little), any FAI effort will face tremendous obstacles in transforming theory into practice—both in the fact that the theory will have been developed without the guidance that real-world constraints and engineering goals provide, and the fact that there is always overhead and R&D involved in applying theoretical research. I think many people here underestimate this vast difference.
both in the fact that the theory will have been developed without the guidance that real-world constraints and engineering goals provide, and the fact that there is always overhead and R&D involved in applying theoretical research. I think many people here underestimate this vast difference.
Some people might underestimate the difficulty. On the other hand even if doing FAI research is immensely difficult that doesn’t mean that we shouldn’t do FAI research. The stakes are to high to avoid doing the best we can.
I think that almost all research done before that will have to be thrown out. Maybe the little that isn’t will be worth it given the risks, but it will be a small amount.
How did you reach that conclusion? To me it seems very unlikely. For example it seems that there’s a good chance the AGI will have something called “utility function”. So we can start thinking of what is the correct utility function for a FAI even if we don’t know how to build an optimizer around it. We can study problems like decision theory to better understand the domain on the utility function. etc.
It’s not clear at all that AGI will have a utility function. But furthermore, bolting a complex, friendly utility function onto whatever AI architecture we come up with will probably be a very difficult feat of engineering, which can’t even begin until we actually have that AI architecture.
It’s not clear at all that AGI will have a utility function.
That’s something I’m willing to take bets on. Regardless, it is precisely the type of question we better start studying right now. It is a question with high FAI-relevance which is likely to be important for AGI regardless of friendliness.
But furthermore, bolting a complex, friendly utility function onto whatever AI architecture we come up with will probably be a very difficult feat of engineering...
I doubt it. IMO AGI will be able to optimize any utility function, that’s what makes it an aGi. However, even if you’re right, we still need to start working on finding that utility function.
I question both of these premises. It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function (even long-term goals are often inconsistent with each other), and even if you could demonstrate to it a utility function that met some extremely high standards, it would not be persuaded to adopt it. Attempting to build in such a utility function could be possible, but not necessarily natural at all; in fact I bet it would be unnatural and difficult.
I understand your rebuttal to “friendliness research is too premature to be useful” is “It is important enough to risk being premature”, but I hope you can agree that stronger arguments would put forward stronger evidence that the risk is not particularly large.
But let’s leave that aside. I’ll concede that it is possible that developing a strong friendliness theory before strong AI could be the only path to safe AI under some circumstances.
I still think that it is mistaken to try to ignore intermediate scenarios and focus only on that case. I wrote about this in a post before, How to Study AGIs safely
which you commented on.
It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function...
I doubt the first AGI will be like this, unless you count WBE as AGI. But if it will, it’s very bad news, since it would be very difficult to make it friendly. Such an AGI is akin to an alien species which evolved under conditions vastly different from ours: it will probably have very different values.
I have updated my respect for MIRI significantly based on Stu Russell signing that article. (Russell is a prominent mainstream computer scientist working on related issues; as a result his opinion I think has substantially more credibility here than the physicists.)
I have updated my respect for MIRI significantly based on Stu Russell signing that article.
If you don’t think that MIRI’s arguments are convincing, then I don’t see how one outlier could significantly shift your perception, if this person does not provide additional arguments.
I would give up most of my skepticism regarding AI risks if a significant subset of experts agreed with MIRI, even if they did not provide further arguments (although a consensus would be desirable). But one expert does clearly not suffice to make up for a lack of convincing arguments.
Also note that Peter Norvig, who coauthored ‘Artificial Intelligence: A Modern Approach’ with Russell, does not appear to be too worried.
I mean to say that if you understand the work of Russell or other AI researchers, you understand just how large the gap is between what we know and what we could possibly apply friendliness to. Friendliness research is purely aspirational and highly speculative. It’s far more pie-in-the-sky than anti-aging research, even. Nothing wrong with Russell calling for pie-in-the-sky research, of course, but I think most people don’t understand the gulf.
When somebody says something like “Google should be careful they don’t develop Skynet” they’re demonstrating the misunderstanding that we even have the faintest notion of how to develop Skynet (and happily that means AI safety isn’t much of a problem).
I’ve read AIMA, but aren’t really up speed on the last 20 years of cutting edge AI research, which it addresses less. I don’t have the same intuition about AGI concerns being significantly more hypothetical than anti-aging stuff. For me that would mean something like “any major AGI development before 2050 or so is so improbably it’s not worth considering”, given how I’m not very optimistic on quick progress in anti-aging.
This would be my intuition if I could be sure the problem looks something like “engineer a system at least as complex as a complete adult brain”. The problem is that an AGI solution could also be “engineer a learning system that will learn to behave at human level or above intelligence at human life timespan or faster”, and I have much shakier intuitions about what the minimal required invention is for that to happen. It’s probably still ways out, but I have nothing like the same certainty of it being ways out as I have for the “directly engineer an adult human brain equivalent system” case.
So given how this whole thread is about knowing the literature better, what should I go read to build better intuition on how to estimate limits for the necessary initial complexity of learning systems?
I’m guessing Punoxysm’s pointing to the fact that the algorithms used for contemporary machine learning are pretty simple; few of them involve anything more complicated than repeated matrix multiplication at their core, although a lot of code can go into generating, filtering, and permuting their inputs.
I’m not sure that necessarily implies a lack of sophistication or potential, though. There’s a tendency to look at the human mind’s outputs and conclude that its architecture must involve comparable specialization and variety, but I suspect that’s a confusion of levels; the world’s awash in locally simple math with complex consequences. Not that I think an artificial neural network, say, is a particularly close representation of natural neurology; it pretty clearly isn’t.
I agree with you on both counts—that most human cognition is simpler than it appears in particular. But some of it isn’t, and that’s probably the really critical part when we talk about strong AI.
For instance, I think that a computer could write a “Turing Novel” that would be indistinguishable from some human-made fiction with just a little bit of human editing, and that would still leave us quite far from FOOMable AI (I don’t mean this could happen today, but say in 10 years).
Not so much a specific misconception, but understanding the current state of AI research and understanding how mechanical most AI is (even if the mechanisms are impressive) should make you realize that being a “Friendly AI researcher” is a bit like being a unicorn tamer (and I mean that in a nice way—surely some enterprising genetic engineer will someday make unicorns).
Edit: Maybe I was being a little snarky—my meaning is simply this: Given how little we know about what actual Strong AI will look like (And we genuinely know very very little), any FAI effort will face tremendous obstacles in transforming theory into practice—both in the fact that the theory will have been developed without the guidance that real-world constraints and engineering goals provide, and the fact that there is always overhead and R&D involved in applying theoretical research. I think many people here underestimate this vast difference.
Some people might underestimate the difficulty. On the other hand even if doing FAI research is immensely difficult that doesn’t mean that we shouldn’t do FAI research. The stakes are to high to avoid doing the best we can.
I think that if we only start friendliness research when we’re obviously close to building an AGI, it will be too late.
I think that almost all research done before that will have to be thrown out. Maybe the little that isn’t will be worth it given the risks, but it will be a small amount.
How did you reach that conclusion? To me it seems very unlikely. For example it seems that there’s a good chance the AGI will have something called “utility function”. So we can start thinking of what is the correct utility function for a FAI even if we don’t know how to build an optimizer around it. We can study problems like decision theory to better understand the domain on the utility function. etc.
It’s not clear at all that AGI will have a utility function. But furthermore, bolting a complex, friendly utility function onto whatever AI architecture we come up with will probably be a very difficult feat of engineering, which can’t even begin until we actually have that AI architecture.
That’s something I’m willing to take bets on. Regardless, it is precisely the type of question we better start studying right now. It is a question with high FAI-relevance which is likely to be important for AGI regardless of friendliness.
I doubt it. IMO AGI will be able to optimize any utility function, that’s what makes it an aGi. However, even if you’re right, we still need to start working on finding that utility function.
I question both of these premises. It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function (even long-term goals are often inconsistent with each other), and even if you could demonstrate to it a utility function that met some extremely high standards, it would not be persuaded to adopt it. Attempting to build in such a utility function could be possible, but not necessarily natural at all; in fact I bet it would be unnatural and difficult.
I understand your rebuttal to “friendliness research is too premature to be useful” is “It is important enough to risk being premature”, but I hope you can agree that stronger arguments would put forward stronger evidence that the risk is not particularly large.
But let’s leave that aside. I’ll concede that it is possible that developing a strong friendliness theory before strong AI could be the only path to safe AI under some circumstances.
I still think that it is mistaken to try to ignore intermediate scenarios and focus only on that case. I wrote about this in a post before, How to Study AGIs safely which you commented on.
I doubt the first AGI will be like this, unless you count WBE as AGI. But if it will, it’s very bad news, since it would be very difficult to make it friendly. Such an AGI is akin to an alien species which evolved under conditions vastly different from ours: it will probably have very different values.
So for example when Stuart Russell is saying that we really should get more serious about doing Friendly AI research, it’s probably because he’s a bit naive and not that familiar with the actual state of real-world AI?
I have updated my respect for MIRI significantly based on Stu Russell signing that article. (Russell is a prominent mainstream computer scientist working on related issues; as a result his opinion I think has substantially more credibility here than the physicists.)
If you don’t think that MIRI’s arguments are convincing, then I don’t see how one outlier could significantly shift your perception, if this person does not provide additional arguments.
I would give up most of my skepticism regarding AI risks if a significant subset of experts agreed with MIRI, even if they did not provide further arguments (although a consensus would be desirable). But one expert does clearly not suffice to make up for a lack of convincing arguments.
Also note that Peter Norvig, who coauthored ‘Artificial Intelligence: A Modern Approach’ with Russell, does not appear to be too worried.
I mean to say that if you understand the work of Russell or other AI researchers, you understand just how large the gap is between what we know and what we could possibly apply friendliness to. Friendliness research is purely aspirational and highly speculative. It’s far more pie-in-the-sky than anti-aging research, even. Nothing wrong with Russell calling for pie-in-the-sky research, of course, but I think most people don’t understand the gulf.
When somebody says something like “Google should be careful they don’t develop Skynet” they’re demonstrating the misunderstanding that we even have the faintest notion of how to develop Skynet (and happily that means AI safety isn’t much of a problem).
I’ve read AIMA, but aren’t really up speed on the last 20 years of cutting edge AI research, which it addresses less. I don’t have the same intuition about AGI concerns being significantly more hypothetical than anti-aging stuff. For me that would mean something like “any major AGI development before 2050 or so is so improbably it’s not worth considering”, given how I’m not very optimistic on quick progress in anti-aging.
This would be my intuition if I could be sure the problem looks something like “engineer a system at least as complex as a complete adult brain”. The problem is that an AGI solution could also be “engineer a learning system that will learn to behave at human level or above intelligence at human life timespan or faster”, and I have much shakier intuitions about what the minimal required invention is for that to happen. It’s probably still ways out, but I have nothing like the same certainty of it being ways out as I have for the “directly engineer an adult human brain equivalent system” case.
So given how this whole thread is about knowing the literature better, what should I go read to build better intuition on how to estimate limits for the necessary initial complexity of learning systems?
What do you mean with the term “mechanical”?
I’m guessing Punoxysm’s pointing to the fact that the algorithms used for contemporary machine learning are pretty simple; few of them involve anything more complicated than repeated matrix multiplication at their core, although a lot of code can go into generating, filtering, and permuting their inputs.
I’m not sure that necessarily implies a lack of sophistication or potential, though. There’s a tendency to look at the human mind’s outputs and conclude that its architecture must involve comparable specialization and variety, but I suspect that’s a confusion of levels; the world’s awash in locally simple math with complex consequences. Not that I think an artificial neural network, say, is a particularly close representation of natural neurology; it pretty clearly isn’t.
I agree with you on both counts—that most human cognition is simpler than it appears in particular. But some of it isn’t, and that’s probably the really critical part when we talk about strong AI.
For instance, I think that a computer could write a “Turing Novel” that would be indistinguishable from some human-made fiction with just a little bit of human editing, and that would still leave us quite far from FOOMable AI (I don’t mean this could happen today, but say in 10 years).