This is a zero-sum game where every person working on x-risk is a technical person explicitly not working on advancing technologies (like AI) that will increase standards of living and help solve our global problems. If someone chooses to work on AI x-risk, they are probably qualified to work directly on the hard problems of AI itself. By not working on AI they are incrementally slowing down AI efforts, and therefore delaying access to technology that could save the world.
I wouldn’t worry much about this, because the financial incentives to advance AI are much stronger than the ones to work on AI safety. AI safety work is just a blip compared to AI advancement work.
So here’s a utilitarian calculation for you: assume that AGI will allow us to conquer disease and natural death, by virtue of the fact that true AGI removes scarcity of intellectual resources to work on these problems. It’s a bit of a naïve view, but I’m asking you to assume it only for the sake of argument. Then every moment someone is working on x-risk problems instead, they are potentially delaying the advent of true AGI by some number of minutes, hours, or days. Multiply that by the number of people who die unnecessary deaths every day—hundreds of thousands—and that is the amount of blood on the hands of someone who is capable but chooses not to work on making the technology widely available as quickly as possible. Existential risk can only be justified as a more pressing concern if can be reasonably demonstrated to have a higher probability of causing more deaths than inaction.
You should really read Astronomical Waste before you try to make this kind of quasi-utilitarian argument about x-risk :)
Show me the code. Demonstrate for me (in a toy but realistic environment) an AI/proto-AGI that turns evil, built using the architectures that are the current focus of research, and give me reasonable technical justification for why we should expect the same properties in larger, more complex environments.
I’ve read Astronomical Waste. There’s some good ideas in it, but I simply don’t buy the premise that “potential lives” are comparable to existing lives. In utilitarian terms I suppose I value potential lives at zero.
Regarding the poopy Roomba, that’s not anything close to resembling an AGI. Dumb mechanical algorithms follow dumb mechanical algorithms. There’s nothing really interesting to be learned there. But even if you take it as an example at face value, it was relatively simple for its owner to capture, turn off, and clean up. Exaggeration aside, this Roomba would not actually start WW3 in an attempt to eliminate the threat posed by humans to its own survival.
By AGI in a toy environment I mean an actual general-purpose problem solver using one of the many existing AGI architectures, but placed in a simplified, simulated environment. I want ot see a demonstration that the sort of wacky failure modes discussed here and in Superintelligence actually occur on real architectures in non-contrived environments. Does the AI really attempt to hack its way out of the matrix and forceably upload its creators instead of simply asking for clarification? Is it really the case that the Omohundro drives emerge causing the AI to seek self-preservation at all costs?
These CAN be safely tested by constructing toy environments designed to mimic a simplified version of reality, with carefully placed honeypots that are unrelated to the AI’s direct goals but plausibly provide mechanisms for escape but instead trap without warning when activated. I would consider even that an extreme level of paranoia since the simplest safe measure is to run the AI at slow enough speed and computational resources that the experimenters can observe and understand what is going on.
My basic objection is that all of this AI x-risk theory is based on super simplified models of AI, e.g. universal bayesian optimizers with infinite computing resources. Real general intelligences are not even approximations of this abstract model. Real intelligence architectures, including the human brain, are amalgams of special purpose heuristic engines, knowledge representation, and problem solving that can only kinda-sorta in some situations be approximated by universal optimizers but in fact fundamentally work quite differently for a variety of reasons. And in the human mind, for example, it is these recursive webs of heuristics and memory combined with a few instinctual responses and the experience of embodiment to give rise to learned morality. So what is a real AGI architecture likely to behave like—the cool and calculating hyper-rational universal optimizer, or the bumbling learn-by-trial-and-error of a human child? It depends on the architecture! And a lot of the AI x-risk concerns don’t really apply in the latter case.
TL;DR: I want to see actual AIs implemented using current thinking re: AGI architectures, given the chance to make decisions in a toy environment that is simple but not what their special purpose components were designed to work in (so general intelligence needs to be engaged), and see whether they actually enter into the sorts of failure modes AI x-risk people worry about. I suspect they will not, but remain open to the possibility they will if only it can be demonstrated under repeatable experimental conditions.
Specifically you could start by learning about the work already being done in the field of AGI and applying your x-risk ideas to that body of knowledge instead of reinventing the wheel (as AI safety people sadly have often done).
The “reinventing the wheel” I was referencing was the work based on AIXI as a general intelligence algorithm. AIXI does not scale. It is to AGI what photon mapping is to real-time rendering. It is already well known that AIXI will result in all sorts of wireheading like behavior. Yet the proofs of this are heavily dependent on the AIXI architecture, and hence my first issue: I don’t trust that these failure modes apply to other architectures unless they can be independently demonstrated there.
My second issue is what I engaged Vaniver on: these are results showing failure modes where the AI’s reward function doesn’t result in the desired behavior—it wireheads instead. That’s not a very interesting result. On the face it is basically just saying that AI software can be buggy. Yes software can be buggy, and we know how to deal with that. In my day job I manage a software dev team for safety critical systems. What is really being argued here is that AI has fundamentally different error modes than regular safety-critical software, because the AI could end up acting adversarialy and optimizing us out of existence, and being successful at it. That, I am arguing, is both an unjustified cognitive leap and not demonstrated by the examples here.
I replied here because I don’t think I really have more to say on the topic beyond this one post.
I wouldn’t worry much about this, because the financial incentives to advance AI are much stronger than the ones to work on AI safety. AI safety work is just a blip compared to AI advancement work.
You should really read Astronomical Waste before you try to make this kind of quasi-utilitarian argument about x-risk :)
What do you think of this example?
https://www.facebook.com/jesse.newton.37/posts/776177951574
(I’m sure there are better examples to be found, I’m just trying to figure out what you are looking for.)
I’ve read Astronomical Waste. There’s some good ideas in it, but I simply don’t buy the premise that “potential lives” are comparable to existing lives. In utilitarian terms I suppose I value potential lives at zero.
Regarding the poopy Roomba, that’s not anything close to resembling an AGI. Dumb mechanical algorithms follow dumb mechanical algorithms. There’s nothing really interesting to be learned there. But even if you take it as an example at face value, it was relatively simple for its owner to capture, turn off, and clean up. Exaggeration aside, this Roomba would not actually start WW3 in an attempt to eliminate the threat posed by humans to its own survival.
By AGI in a toy environment I mean an actual general-purpose problem solver using one of the many existing AGI architectures, but placed in a simplified, simulated environment. I want ot see a demonstration that the sort of wacky failure modes discussed here and in Superintelligence actually occur on real architectures in non-contrived environments. Does the AI really attempt to hack its way out of the matrix and forceably upload its creators instead of simply asking for clarification? Is it really the case that the Omohundro drives emerge causing the AI to seek self-preservation at all costs?
These CAN be safely tested by constructing toy environments designed to mimic a simplified version of reality, with carefully placed honeypots that are unrelated to the AI’s direct goals but plausibly provide mechanisms for escape but instead trap without warning when activated. I would consider even that an extreme level of paranoia since the simplest safe measure is to run the AI at slow enough speed and computational resources that the experimenters can observe and understand what is going on.
My basic objection is that all of this AI x-risk theory is based on super simplified models of AI, e.g. universal bayesian optimizers with infinite computing resources. Real general intelligences are not even approximations of this abstract model. Real intelligence architectures, including the human brain, are amalgams of special purpose heuristic engines, knowledge representation, and problem solving that can only kinda-sorta in some situations be approximated by universal optimizers but in fact fundamentally work quite differently for a variety of reasons. And in the human mind, for example, it is these recursive webs of heuristics and memory combined with a few instinctual responses and the experience of embodiment to give rise to learned morality. So what is a real AGI architecture likely to behave like—the cool and calculating hyper-rational universal optimizer, or the bumbling learn-by-trial-and-error of a human child? It depends on the architecture! And a lot of the AI x-risk concerns don’t really apply in the latter case.
TL;DR: I want to see actual AIs implemented using current thinking re: AGI architectures, given the chance to make decisions in a toy environment that is simple but not what their special purpose components were designed to work in (so general intelligence needs to be engaged), and see whether they actually enter into the sorts of failure modes AI x-risk people worry about. I suspect they will not, but remain open to the possibility they will if only it can be demonstrated under repeatable experimental conditions.
Here is another link
I just saw this link, maybe you have thoughts?
(Let’s move subsequent discussion over there)
Earlier in this thread:
The “reinventing the wheel” I was referencing was the work based on AIXI as a general intelligence algorithm. AIXI does not scale. It is to AGI what photon mapping is to real-time rendering. It is already well known that AIXI will result in all sorts of wireheading like behavior. Yet the proofs of this are heavily dependent on the AIXI architecture, and hence my first issue: I don’t trust that these failure modes apply to other architectures unless they can be independently demonstrated there.
My second issue is what I engaged Vaniver on: these are results showing failure modes where the AI’s reward function doesn’t result in the desired behavior—it wireheads instead. That’s not a very interesting result. On the face it is basically just saying that AI software can be buggy. Yes software can be buggy, and we know how to deal with that. In my day job I manage a software dev team for safety critical systems. What is really being argued here is that AI has fundamentally different error modes than regular safety-critical software, because the AI could end up acting adversarialy and optimizing us out of existence, and being successful at it. That, I am arguing, is both an unjustified cognitive leap and not demonstrated by the examples here.
I replied here because I don’t think I really have more to say on the topic beyond this one post.