You describe the arguments of AI safety advocates as being handwavey and lacking rigor. Do you believe you have arguments for why AI safety should not be a concern that are more rigorous? If not, do you think there’s a reason why we should privilege your position?
Most of the arguments I’ve heard from you are arguments that AI is going to progress slowly. I haven’t heard arguments from AI safety advocates that AI will progress quickly, so I’m not sure there is a disagreement. I’ve heard arguments that AI may progress quickly, but a few anecdotes about instances of slow progress strike me as a pretty handwavey/non-rigorous response. I could just as easily provide anecdotes of unexpectedly quick progress (e.g. AIs able to beat humans at Go arrived ~10 years ahead of schedule). Note that the claim you are going for is a substantially stronger one than the one I hear from AI safety folks: you’re saying that we can be confident that things will play out in one particular way, and AI safety people say that we should be prepared for the possibility that things play out in a variety of different ways.
FWIW, I’m pretty sure Bostrom’s thinking on AI predates Less Wrong by quite a bit.
“Do you think there’s a reason why we should privilege your position” was probably a bad question to ask because people can argue forever about which side “should” have the burden of proof without actually making progress resolving a disagreement. A statement like
The burden of proof therefore belongs to those who propose restrictive measures.
...is not one that we can demonstrate to be true or false through some experiment or deductive argument. When a bunch of transhumanists get together to talk about the precautionary principle, it’s unsurprising that they’ll come up with something that embeds the opposite set of values.
BTW, what specific restrictive measures do you see the AI safety folks proposing? From Scott Alexander’s AI Researchers on AI Risk:
The “skeptic” position seems to be that, although we should probably get a couple of bright people to start working on preliminary aspects of the problem, we shouldn’t panic or start trying to ban AI research.
The “believers”, meanwhile, insist that although we shouldn’t panic or start trying to ban AI research, we should probably get a couple of bright people to start working on preliminary aspects of the problem.
(Control-f ‘controversy’ in the essay to get more thoughts along the same lines)
Like Max More, I’m a transhumanist. But I’m also a utilitarian. If you are too, maybe we can have a productive discussion where we work from utilitarianism as a shared premise.
As a utilitarian, I find Nick Bostrom’s argument for existential risk minimization pretty compelling. Do you have thoughts?
Note Bostrom doesn’t necessarily think we should be biased towards slow tech progress:
...instead of thinking about sustainability as is commonly known, as this static concept that has a stable state that we should try to approximate, where we use up no more resources than are regenerated by the natural environment, we need, I think, to think about sustainability in dynamical terms, where instead of reaching a state, we try to enter and stay on a trajectory that is indefinitely sustainable in the sense that we can contain it to travel on that trajectory indefinitely and it leads in a good direction.
So speaking from a utilitarian perspective, I don’t see good reasons to have a strong pro-tech prior or a strong anti-tech prior. Tech has brought us both disease reduction and nuclear weapons.
Predicting the future is unsolved in the general case. Nevertheless, I agree with Max More that we should do the best we can, and in fact one of the most serious attempts I know of to forecast AI has come out of the AI safety community: http://aiimpacts.org/ Do you know of any comparable effort being made by people unconcerned with AI safety?
I’m not a utilitarian. Sorry to be so succinct in reply to what was obviously a well written and thoughtful comment, but I don’t have much to say with respect to utilitarian arguments over AI x-risk because I never think about such things.
Regarding your final points, I think the argument can be convincingly made—and has been made by Steven Pinker and others—that technology has overwhelmingly been beneficial to the people of this planet Earth in reducing per-capita disease & violence. Technology has for the most part cured disease, not “brought it”, and nuclear weapons have kept conflicts localized in scale since 1945. There’s been some horrors since WW2, to be sure, but nothing on the scale of either the 1st or 2nd world war, at least not in global conflict among countries allied with adversarial nuclear powers. Nuclear weapons have probably saved far more lives in the generations that followed than the combined populations of Hiroshima and Nagasaki (to say nothing of the lives spared by an early end to that war). Even where technology has been failing us—climate change, for example—it is future technology that holds the potential to save us and the sooner we develop it the better.
All things being equal, it is my own personal opinion that the most noble thing a person can do is to push forward the wheels of progress and help us through the grind of leveling up our society as quickly as possible, to relieve pain and suffering and bring greater prosperity to the world’s population. And before you say “we don’t want to slow progress, we just want some people to focus on x-risk as well” keep in mind that the global pool of talent is limited. This is a zero-sum game where every person working on x-risk is a technical person explicitly not working on advancing technologies (like AI) that will increase standards of living and help solve our global problems. If someone chooses to work on AI x-risk, they are probably qualified to work directly on the hard problems of AI itself. By not working on AI they are incrementally slowing down AI efforts, and therefore delaying access to technology that could save the world.
So here’s a utilitarian calculation for you: assume that AGI will allow us to conquer disease and natural death, by virtue of the fact that true AGI removes scarcity of intellectual resources to work on these problems. It’s a bit of a naïve view, but I’m asking you to assume it only for the sake of argument. Then every moment someone is working on x-risk problems instead, they are potentially delaying the advent of true AGI by some number of minutes, hours, or days. Multiply that by the number of people who die unnecessary deaths every day—hundreds of thousands—and that is the amount of blood on the hands of someone who is capable but chooses not to work on making the technology widely available as quickly as possible. Existential risk can only be justified as a more pressing concern if can be reasonably demonstrated to have a higher probability of causing more deaths than inaction.
The key word there is reasonable. I have too much experience in this world building real things to accept arguments based on guesswork or convoluted philosophy. Show me the code. Demonstrate for me (in a toy but realistic environment) an AI/proto-AGI that turns evil, built using the architectures that are the current focus of research, and give me reasonable technical justification for why we should expect the same properties in larger, more complex environments. Without actual proof I will forever remain unconvinced, because in my experience there are just too many bullshit justifications one can create which pass internal review, and even convince a panel of experts, but fall apart as soon as it tested by reality.
Which brings me to the point I made above: you think you know how AI of the sort people are working on will go evil/non-friendly and destroy the world? Well go build one in a box and write a paper about it. But until you actually do that, and show me a replicatable experiment, I’m really not interested. I’ll go back to setting an ignore bit on all this AI x-risk nonsense and keep pushing the wheel of progress forward before that body count rises too far.
This is a zero-sum game where every person working on x-risk is a technical person explicitly not working on advancing technologies (like AI) that will increase standards of living and help solve our global problems. If someone chooses to work on AI x-risk, they are probably qualified to work directly on the hard problems of AI itself. By not working on AI they are incrementally slowing down AI efforts, and therefore delaying access to technology that could save the world.
I wouldn’t worry much about this, because the financial incentives to advance AI are much stronger than the ones to work on AI safety. AI safety work is just a blip compared to AI advancement work.
So here’s a utilitarian calculation for you: assume that AGI will allow us to conquer disease and natural death, by virtue of the fact that true AGI removes scarcity of intellectual resources to work on these problems. It’s a bit of a naïve view, but I’m asking you to assume it only for the sake of argument. Then every moment someone is working on x-risk problems instead, they are potentially delaying the advent of true AGI by some number of minutes, hours, or days. Multiply that by the number of people who die unnecessary deaths every day—hundreds of thousands—and that is the amount of blood on the hands of someone who is capable but chooses not to work on making the technology widely available as quickly as possible. Existential risk can only be justified as a more pressing concern if can be reasonably demonstrated to have a higher probability of causing more deaths than inaction.
You should really read Astronomical Waste before you try to make this kind of quasi-utilitarian argument about x-risk :)
Show me the code. Demonstrate for me (in a toy but realistic environment) an AI/proto-AGI that turns evil, built using the architectures that are the current focus of research, and give me reasonable technical justification for why we should expect the same properties in larger, more complex environments.
I’ve read Astronomical Waste. There’s some good ideas in it, but I simply don’t buy the premise that “potential lives” are comparable to existing lives. In utilitarian terms I suppose I value potential lives at zero.
Regarding the poopy Roomba, that’s not anything close to resembling an AGI. Dumb mechanical algorithms follow dumb mechanical algorithms. There’s nothing really interesting to be learned there. But even if you take it as an example at face value, it was relatively simple for its owner to capture, turn off, and clean up. Exaggeration aside, this Roomba would not actually start WW3 in an attempt to eliminate the threat posed by humans to its own survival.
By AGI in a toy environment I mean an actual general-purpose problem solver using one of the many existing AGI architectures, but placed in a simplified, simulated environment. I want ot see a demonstration that the sort of wacky failure modes discussed here and in Superintelligence actually occur on real architectures in non-contrived environments. Does the AI really attempt to hack its way out of the matrix and forceably upload its creators instead of simply asking for clarification? Is it really the case that the Omohundro drives emerge causing the AI to seek self-preservation at all costs?
These CAN be safely tested by constructing toy environments designed to mimic a simplified version of reality, with carefully placed honeypots that are unrelated to the AI’s direct goals but plausibly provide mechanisms for escape but instead trap without warning when activated. I would consider even that an extreme level of paranoia since the simplest safe measure is to run the AI at slow enough speed and computational resources that the experimenters can observe and understand what is going on.
My basic objection is that all of this AI x-risk theory is based on super simplified models of AI, e.g. universal bayesian optimizers with infinite computing resources. Real general intelligences are not even approximations of this abstract model. Real intelligence architectures, including the human brain, are amalgams of special purpose heuristic engines, knowledge representation, and problem solving that can only kinda-sorta in some situations be approximated by universal optimizers but in fact fundamentally work quite differently for a variety of reasons. And in the human mind, for example, it is these recursive webs of heuristics and memory combined with a few instinctual responses and the experience of embodiment to give rise to learned morality. So what is a real AGI architecture likely to behave like—the cool and calculating hyper-rational universal optimizer, or the bumbling learn-by-trial-and-error of a human child? It depends on the architecture! And a lot of the AI x-risk concerns don’t really apply in the latter case.
TL;DR: I want to see actual AIs implemented using current thinking re: AGI architectures, given the chance to make decisions in a toy environment that is simple but not what their special purpose components were designed to work in (so general intelligence needs to be engaged), and see whether they actually enter into the sorts of failure modes AI x-risk people worry about. I suspect they will not, but remain open to the possibility they will if only it can be demonstrated under repeatable experimental conditions.
Specifically you could start by learning about the work already being done in the field of AGI and applying your x-risk ideas to that body of knowledge instead of reinventing the wheel (as AI safety people sadly have often done).
The “reinventing the wheel” I was referencing was the work based on AIXI as a general intelligence algorithm. AIXI does not scale. It is to AGI what photon mapping is to real-time rendering. It is already well known that AIXI will result in all sorts of wireheading like behavior. Yet the proofs of this are heavily dependent on the AIXI architecture, and hence my first issue: I don’t trust that these failure modes apply to other architectures unless they can be independently demonstrated there.
My second issue is what I engaged Vaniver on: these are results showing failure modes where the AI’s reward function doesn’t result in the desired behavior—it wireheads instead. That’s not a very interesting result. On the face it is basically just saying that AI software can be buggy. Yes software can be buggy, and we know how to deal with that. In my day job I manage a software dev team for safety critical systems. What is really being argued here is that AI has fundamentally different error modes than regular safety-critical software, because the AI could end up acting adversarialy and optimizing us out of existence, and being successful at it. That, I am arguing, is both an unjustified cognitive leap and not demonstrated by the examples here.
I replied here because I don’t think I really have more to say on the topic beyond this one post.
You describe the arguments of AI safety advocates as being handwavey and lacking rigor. Do you believe you have arguments for why AI safety should not be a concern that are more rigorous? If not, do you think there’s a reason why we should privilege your position?
Most of the arguments I’ve heard from you are arguments that AI is going to progress slowly. I haven’t heard arguments from AI safety advocates that AI will progress quickly, so I’m not sure there is a disagreement. I’ve heard arguments that AI may progress quickly, but a few anecdotes about instances of slow progress strike me as a pretty handwavey/non-rigorous response. I could just as easily provide anecdotes of unexpectedly quick progress (e.g. AIs able to beat humans at Go arrived ~10 years ahead of schedule). Note that the claim you are going for is a substantially stronger one than the one I hear from AI safety folks: you’re saying that we can be confident that things will play out in one particular way, and AI safety people say that we should be prepared for the possibility that things play out in a variety of different ways.
FWIW, I’m pretty sure Bostrom’s thinking on AI predates Less Wrong by quite a bit.
Yes, the proactionary principle:
http://www.maxmore.com/proactionary.html
I don’t like the precautionary principle either, but reversed stupidity is not intelligence.
“Do you think there’s a reason why we should privilege your position” was probably a bad question to ask because people can argue forever about which side “should” have the burden of proof without actually making progress resolving a disagreement. A statement like
...is not one that we can demonstrate to be true or false through some experiment or deductive argument. When a bunch of transhumanists get together to talk about the precautionary principle, it’s unsurprising that they’ll come up with something that embeds the opposite set of values.
BTW, what specific restrictive measures do you see the AI safety folks proposing? From Scott Alexander’s AI Researchers on AI Risk:
(Control-f ‘controversy’ in the essay to get more thoughts along the same lines)
Like Max More, I’m a transhumanist. But I’m also a utilitarian. If you are too, maybe we can have a productive discussion where we work from utilitarianism as a shared premise.
As a utilitarian, I find Nick Bostrom’s argument for existential risk minimization pretty compelling. Do you have thoughts?
Note Bostrom doesn’t necessarily think we should be biased towards slow tech progress:
http://www.stafforini.com/blog/bostrom/
So speaking from a utilitarian perspective, I don’t see good reasons to have a strong pro-tech prior or a strong anti-tech prior. Tech has brought us both disease reduction and nuclear weapons.
Predicting the future is unsolved in the general case. Nevertheless, I agree with Max More that we should do the best we can, and in fact one of the most serious attempts I know of to forecast AI has come out of the AI safety community: http://aiimpacts.org/ Do you know of any comparable effort being made by people unconcerned with AI safety?
I’m not a utilitarian. Sorry to be so succinct in reply to what was obviously a well written and thoughtful comment, but I don’t have much to say with respect to utilitarian arguments over AI x-risk because I never think about such things.
Regarding your final points, I think the argument can be convincingly made—and has been made by Steven Pinker and others—that technology has overwhelmingly been beneficial to the people of this planet Earth in reducing per-capita disease & violence. Technology has for the most part cured disease, not “brought it”, and nuclear weapons have kept conflicts localized in scale since 1945. There’s been some horrors since WW2, to be sure, but nothing on the scale of either the 1st or 2nd world war, at least not in global conflict among countries allied with adversarial nuclear powers. Nuclear weapons have probably saved far more lives in the generations that followed than the combined populations of Hiroshima and Nagasaki (to say nothing of the lives spared by an early end to that war). Even where technology has been failing us—climate change, for example—it is future technology that holds the potential to save us and the sooner we develop it the better.
All things being equal, it is my own personal opinion that the most noble thing a person can do is to push forward the wheels of progress and help us through the grind of leveling up our society as quickly as possible, to relieve pain and suffering and bring greater prosperity to the world’s population. And before you say “we don’t want to slow progress, we just want some people to focus on x-risk as well” keep in mind that the global pool of talent is limited. This is a zero-sum game where every person working on x-risk is a technical person explicitly not working on advancing technologies (like AI) that will increase standards of living and help solve our global problems. If someone chooses to work on AI x-risk, they are probably qualified to work directly on the hard problems of AI itself. By not working on AI they are incrementally slowing down AI efforts, and therefore delaying access to technology that could save the world.
So here’s a utilitarian calculation for you: assume that AGI will allow us to conquer disease and natural death, by virtue of the fact that true AGI removes scarcity of intellectual resources to work on these problems. It’s a bit of a naïve view, but I’m asking you to assume it only for the sake of argument. Then every moment someone is working on x-risk problems instead, they are potentially delaying the advent of true AGI by some number of minutes, hours, or days. Multiply that by the number of people who die unnecessary deaths every day—hundreds of thousands—and that is the amount of blood on the hands of someone who is capable but chooses not to work on making the technology widely available as quickly as possible. Existential risk can only be justified as a more pressing concern if can be reasonably demonstrated to have a higher probability of causing more deaths than inaction.
The key word there is reasonable. I have too much experience in this world building real things to accept arguments based on guesswork or convoluted philosophy. Show me the code. Demonstrate for me (in a toy but realistic environment) an AI/proto-AGI that turns evil, built using the architectures that are the current focus of research, and give me reasonable technical justification for why we should expect the same properties in larger, more complex environments. Without actual proof I will forever remain unconvinced, because in my experience there are just too many bullshit justifications one can create which pass internal review, and even convince a panel of experts, but fall apart as soon as it tested by reality.
Which brings me to the point I made above: you think you know how AI of the sort people are working on will go evil/non-friendly and destroy the world? Well go build one in a box and write a paper about it. But until you actually do that, and show me a replicatable experiment, I’m really not interested. I’ll go back to setting an ignore bit on all this AI x-risk nonsense and keep pushing the wheel of progress forward before that body count rises too far.
I wouldn’t worry much about this, because the financial incentives to advance AI are much stronger than the ones to work on AI safety. AI safety work is just a blip compared to AI advancement work.
You should really read Astronomical Waste before you try to make this kind of quasi-utilitarian argument about x-risk :)
What do you think of this example?
https://www.facebook.com/jesse.newton.37/posts/776177951574
(I’m sure there are better examples to be found, I’m just trying to figure out what you are looking for.)
I’ve read Astronomical Waste. There’s some good ideas in it, but I simply don’t buy the premise that “potential lives” are comparable to existing lives. In utilitarian terms I suppose I value potential lives at zero.
Regarding the poopy Roomba, that’s not anything close to resembling an AGI. Dumb mechanical algorithms follow dumb mechanical algorithms. There’s nothing really interesting to be learned there. But even if you take it as an example at face value, it was relatively simple for its owner to capture, turn off, and clean up. Exaggeration aside, this Roomba would not actually start WW3 in an attempt to eliminate the threat posed by humans to its own survival.
By AGI in a toy environment I mean an actual general-purpose problem solver using one of the many existing AGI architectures, but placed in a simplified, simulated environment. I want ot see a demonstration that the sort of wacky failure modes discussed here and in Superintelligence actually occur on real architectures in non-contrived environments. Does the AI really attempt to hack its way out of the matrix and forceably upload its creators instead of simply asking for clarification? Is it really the case that the Omohundro drives emerge causing the AI to seek self-preservation at all costs?
These CAN be safely tested by constructing toy environments designed to mimic a simplified version of reality, with carefully placed honeypots that are unrelated to the AI’s direct goals but plausibly provide mechanisms for escape but instead trap without warning when activated. I would consider even that an extreme level of paranoia since the simplest safe measure is to run the AI at slow enough speed and computational resources that the experimenters can observe and understand what is going on.
My basic objection is that all of this AI x-risk theory is based on super simplified models of AI, e.g. universal bayesian optimizers with infinite computing resources. Real general intelligences are not even approximations of this abstract model. Real intelligence architectures, including the human brain, are amalgams of special purpose heuristic engines, knowledge representation, and problem solving that can only kinda-sorta in some situations be approximated by universal optimizers but in fact fundamentally work quite differently for a variety of reasons. And in the human mind, for example, it is these recursive webs of heuristics and memory combined with a few instinctual responses and the experience of embodiment to give rise to learned morality. So what is a real AGI architecture likely to behave like—the cool and calculating hyper-rational universal optimizer, or the bumbling learn-by-trial-and-error of a human child? It depends on the architecture! And a lot of the AI x-risk concerns don’t really apply in the latter case.
TL;DR: I want to see actual AIs implemented using current thinking re: AGI architectures, given the chance to make decisions in a toy environment that is simple but not what their special purpose components were designed to work in (so general intelligence needs to be engaged), and see whether they actually enter into the sorts of failure modes AI x-risk people worry about. I suspect they will not, but remain open to the possibility they will if only it can be demonstrated under repeatable experimental conditions.
Here is another link
I just saw this link, maybe you have thoughts?
(Let’s move subsequent discussion over there)
Earlier in this thread:
The “reinventing the wheel” I was referencing was the work based on AIXI as a general intelligence algorithm. AIXI does not scale. It is to AGI what photon mapping is to real-time rendering. It is already well known that AIXI will result in all sorts of wireheading like behavior. Yet the proofs of this are heavily dependent on the AIXI architecture, and hence my first issue: I don’t trust that these failure modes apply to other architectures unless they can be independently demonstrated there.
My second issue is what I engaged Vaniver on: these are results showing failure modes where the AI’s reward function doesn’t result in the desired behavior—it wireheads instead. That’s not a very interesting result. On the face it is basically just saying that AI software can be buggy. Yes software can be buggy, and we know how to deal with that. In my day job I manage a software dev team for safety critical systems. What is really being argued here is that AI has fundamentally different error modes than regular safety-critical software, because the AI could end up acting adversarialy and optimizing us out of existence, and being successful at it. That, I am arguing, is both an unjustified cognitive leap and not demonstrated by the examples here.
I replied here because I don’t think I really have more to say on the topic beyond this one post.