Paperclip maximizer thought experiment makes a lot of people pattern match AI risk to Science Fiction. Do you know any AI risk related thought experiments that avoid that?
Major AI risk is science fiction—that is, it’s the kind of thing science-fiction stories get written about, and it isn’t something we have experience of yet outside fiction. I don’t see how any thought experiment that seriously engages with the issue could not pattern-match to science fiction.
There is a field that thinks hard about risks from unintelligent computers (computer security) that tackles very difficult problems that sometimes get written about in popular fiction (Neil Stephenson, etc.) and manages to not look silly.
I think to the extent that (U)FAI research is even a “real area,” it would be closest in mindset to computer security.
A fully general argument is not “an argument where you can substitute something for X and get something grammatical”. Not all things look silly on TV with the same frequency.
I endorse Lumifer’s quibble about the field of computer security, with the caveat that often the fact that the risks happen inside computer systems is much more important than the fact that they come from people.
The sort of “value alignment” questions MIRI professes (I think sincerely) to worry about seem to me a long way away from computer security, and plausibly relevant to future AI safety. But it could well be that if AI safety really depends on nailing that sort of thing down then we’re unfixably screwed and we should therefore concentrate on problems there is at least some possibility of solving...
I think my point wasn’t about what computer security precisely does, but about the mindset of people who do it (security people cultivate an adversarial point of view about systems).
My secondary point is that computer security is a very solid field, and doesn’t look wishy washy or science fictiony. It has serious conferences, it has research centers, industry labs, intellectual firepower, etc.
I’m not sure how much there is to learn from the field of computer security, with regard to the OP’s question. It’s relatively easy to cultivate an adversarial mindset and get funding for conferences, research centers, labs, intellectual firepower, etc., when adversaries exist at the present time and are causing billions of dollars of damage each year. How to do that if the analogous adversaries are not expected to exist for a decade or more, and we expect it will be too late to get started once the adversaries do exist?
...Can we consider computer security a success story at all? I admit, I am not a professional security researcher but between Bitcoin, the DNMs, and my own interest in computer security & crypto, I read a great deal on these topics and from watching it in real-time, I had the definite impression that, far from anyone at all considering modern computer security a success (or anything you want to emulate at all), the Snowden leaks came as an existential shock and revelation of systemic failure to the security community in which it collectively realized that it had been staggeringly complacent because the NSA had devoted a little effort to concealing its work, that the worst-case scenarios were ludicrously optimistic, and that most research and efforts were almost totally irrelevant to the NSA because the NSA was still hacking everyone everywhere because it had simply shifted resources to attacking the weakest links, be it trusted third parties, decrypted content at rest, the endless list of implementation flaws (Heartbleed etc), and universal attacks benefiting from precomputation. Even those who are the epitome of modern security like Google were appalled to discover how, rather than puzzle over the random oracle model or do deep R&D on quantum computing, the NSA would just tap its inter-datacenter links to get the data it wanted.
When I was researching my “Bitcoin is Worse is Better” essay, I came across a declassified NSA essay where the author visited an academic conference and concluded at the end, with insufferable—yet in retrospect, shockingly humble—arrogance, that while much of the research was of high quality and interesting, the researchers there were no threat to the NSA and never would be. I know I’m not the only one to be struck by that summary because Rogaway quotes it prominently in his recent “The Moral Character of Cryptographic Work” reflecting on the almost total failure of the security community to deliver, well, security. Bernstein also has many excellent critiques of the security community’s failure to deliver security and its frequent tendency towards “l’art pour l’art”.
The Snowden leaks have been to computer security what DNA testing was to the justice system or clinical trials to the medical system; there are probably better words for how the Snowden revelations made computer security researchers look than ‘silly’.
It’s a success story in the sense that there is a lot solid work being done. It is not a success story in the sense that currently, and for the foreseeable future, attack >> defense (but this was true in lots of other areas of warfare throughout various periods of history). We wouldn’t consider armor research not a success story just because at some point flintlocks phased out heavy battlefield armor.
The fact that computer security is having a hard time solving a much easier problem with a ton more resources should worry people who are into AI safety.
We wouldn’t consider armor research not a success story just because at some point flintlocks phased out heavy battlefield armor.
I think you missed the point of my examples. If flintlocks killed heavy battlefield armor, that was because they were genuinely superior and better at attack. But we are not in a ‘machine gun vs bow and arrow’ situation.
The Snowden leaks were a revelation not because the NSA had any sort of major unexpected breakthrough. They have not solved factoring. They do not have quantum computers. They have not made major progress on P=NP or reversing one-way functions. The most advanced stuff from all the Snowden leaks I’ve read was the amortized attack on common hardwired primes, but that again was something well known in the open literature and why we were able to figure it out from the hints in the leaks. In fact, the leaks strongly affirmed that the security community and crypto theory has reached parity with the NSA, that things like PGP were genuinely secure (as far as the crypto went...), and that there were no surprises like differential cryptanalysis waiting in the wings. This is great—except it doesn’t matter.
They were a revelation because they revealed how useless all of that parity was: the NSA simply attacked on the economic, business, political, and implementation planes. There is no need to beat PGP by factoring integers when you can simply tap into Gmail’s datacenters and read the emails decrypted. There is no need to worry overly much about OTR when your TAO teams divert shipments from Amazon, insert a little hardware keylogger, and record everything and exfiltrate out over DNS. Get something into a computer’s BIOS and it’ll never come out. You don’t need to worry much about academics coming up with better hash functions when your affiliated academics, who know what side their bread is buttered on, will quietly quash it in committee or ensure something like export-grade ciphers are included. You don’t need to worry about spending too much on deep cryptanalysis when the existence of C ensures that there will always be zero-days for you to exploit. You don’t need to worry about even revealing capabilities when you can just leak information to your buddies in the FBI or DEA and they will work their tails off to come up with a plausible non-digital story which they can feed the judge. (Your biggest problems, really, are figuring out how to not drown under the tsunami of data coming in at you from all the hacked communications links, subverted computers, bulk collections from cloud datacenters, decrypted VPNs etc.)
This isn’t like guns eliminating armor. This is like an army not bothering with sanitation and wondering why it keeps losing to the other guys, which turns out to be because the latrine contractors are giving kickbacks to the king’s brother.
The fact that computer security is having a hard time solving a much easier problem with a ton more resources should worry people who are into AI safety.
I agree, it absolutely does, and it’s why I find kind of hilarious people who seem to seriously think that to do AI safety, you just need some nested VMs and some protocols. That’s not remotely close to the full scope of the problem. It does no good to come up with a secure sandbox if dozens of external pressures and incentives and cost-cutting and competition mean that the AI will be immediately let out of the box.
(The trend towards attention mechanisms and reinforcement learning in deep learning is an example of this: tool AI technologies want to become agent AIs, because that is how you get rid of expensive slow humans in the loop, make better inferences and decisions, and optimize exploration by deciding what data you need and what experiments to try.)
A fair point, though that mindset is hacker-like in nature. It is, basically, an automatic “how can I break or subvert this system?” reaction to everything.
But the thing is, computer security is an intensely practical field. It’s very much like engineering: has to be realistic/implementable, bad things happen if it fucks up, people pay a lot of money to get good solutions, these solutions are often specific to the circumstances, etc.
AI safety research at the moment is very far from this.
There is a field that thinks hard about risks from unintelligent computers (computer security)
Not quite. Computer security deals with managing risks coming from people, it’s just that the universe where it has to manage risks is a weird superposition of the physical world (see hardware or physical-access attacks), the social world (see social engineering attacks), and the cyberworld (see the usual ’sploit attacks).
I think many people intuitively distrust the idea that an AI could be intelligent enough to transform matter into paperclips in creative ways, but ‘not intelligent enough’ to understand its goals in a human and cultural context (i.e. to satisfy the needs of the business owners of the paperclip factory). This is often due to the confusion that the paperclip maximizer would get its goal function from parsing the sentence “make paperclips”, rather from a preprogrammed reward function, for example a CNN that is trained to map the number of paperclips in images to a scalar reward.
Just speaking of weaknesses of the paperclip maximizer though experiment. I’ve seen this misunderstanding at least 4 out of 10 times that the thought experiment was brought up.
If you are just trying to communicate risk, analogy to a virus might be helpful in this respect. A natural virus can be thought of as code that has goals. If it harms humankind, it doesn’t ‘intend’ to, it is just a side effect of achieving its goals. We might create an artificial virus with a goal that everyone recognizes as beneficial (e.g., end malaria), but that does harm due to unexpected consequences or because the artificial virus evolves, self-modifying its original goal. Note that once a virus is released into the environment, it is nontrivial to ‘delete’ or ‘turn off’. AI will operate in an environment that is many times more complex: “mindspace”.
Paperclip maximizer thought experiment makes a lot of people pattern match AI risk to Science Fiction. Do you know any AI risk related thought experiments that avoid that?
Major AI risk is science fiction—that is, it’s the kind of thing science-fiction stories get written about, and it isn’t something we have experience of yet outside fiction. I don’t see how any thought experiment that seriously engages with the issue could not pattern-match to science fiction.
There is a field that thinks hard about risks from unintelligent computers (computer security) that tackles very difficult problems that sometimes get written about in popular fiction (Neil Stephenson, etc.) and manages to not look silly.
I think to the extent that (U)FAI research is even a “real area,” it would be closest in mindset to computer security.
Computer security as portrayed on TV frequently does look silly.
This is a fully general counterargument: “X as portrayed on TV frequently does look silly”
A fully general argument is not “an argument where you can substitute something for X and get something grammatical”. Not all things look silly on TV with the same frequency.
I endorse Lumifer’s quibble about the field of computer security, with the caveat that often the fact that the risks happen inside computer systems is much more important than the fact that they come from people.
The sort of “value alignment” questions MIRI professes (I think sincerely) to worry about seem to me a long way away from computer security, and plausibly relevant to future AI safety. But it could well be that if AI safety really depends on nailing that sort of thing down then we’re unfixably screwed and we should therefore concentrate on problems there is at least some possibility of solving...
I think my point wasn’t about what computer security precisely does, but about the mindset of people who do it (security people cultivate an adversarial point of view about systems).
My secondary point is that computer security is a very solid field, and doesn’t look wishy washy or science fictiony. It has serious conferences, it has research centers, industry labs, intellectual firepower, etc.
I’m not sure how much there is to learn from the field of computer security, with regard to the OP’s question. It’s relatively easy to cultivate an adversarial mindset and get funding for conferences, research centers, labs, intellectual firepower, etc., when adversaries exist at the present time and are causing billions of dollars of damage each year. How to do that if the analogous adversaries are not expected to exist for a decade or more, and we expect it will be too late to get started once the adversaries do exist?
...Can we consider computer security a success story at all? I admit, I am not a professional security researcher but between Bitcoin, the DNMs, and my own interest in computer security & crypto, I read a great deal on these topics and from watching it in real-time, I had the definite impression that, far from anyone at all considering modern computer security a success (or anything you want to emulate at all), the Snowden leaks came as an existential shock and revelation of systemic failure to the security community in which it collectively realized that it had been staggeringly complacent because the NSA had devoted a little effort to concealing its work, that the worst-case scenarios were ludicrously optimistic, and that most research and efforts were almost totally irrelevant to the NSA because the NSA was still hacking everyone everywhere because it had simply shifted resources to attacking the weakest links, be it trusted third parties, decrypted content at rest, the endless list of implementation flaws (Heartbleed etc), and universal attacks benefiting from precomputation. Even those who are the epitome of modern security like Google were appalled to discover how, rather than puzzle over the random oracle model or do deep R&D on quantum computing, the NSA would just tap its inter-datacenter links to get the data it wanted.
When I was researching my “Bitcoin is Worse is Better” essay, I came across a declassified NSA essay where the author visited an academic conference and concluded at the end, with insufferable—yet in retrospect, shockingly humble—arrogance, that while much of the research was of high quality and interesting, the researchers there were no threat to the NSA and never would be. I know I’m not the only one to be struck by that summary because Rogaway quotes it prominently in his recent “The Moral Character of Cryptographic Work” reflecting on the almost total failure of the security community to deliver, well, security. Bernstein also has many excellent critiques of the security community’s failure to deliver security and its frequent tendency towards “l’art pour l’art”.
The Snowden leaks have been to computer security what DNA testing was to the justice system or clinical trials to the medical system; there are probably better words for how the Snowden revelations made computer security researchers look than ‘silly’.
It’s a success story in the sense that there is a lot solid work being done. It is not a success story in the sense that currently, and for the foreseeable future, attack >> defense (but this was true in lots of other areas of warfare throughout various periods of history). We wouldn’t consider armor research not a success story just because at some point flintlocks phased out heavy battlefield armor.
The fact that computer security is having a hard time solving a much easier problem with a ton more resources should worry people who are into AI safety.
I think you missed the point of my examples. If flintlocks killed heavy battlefield armor, that was because they were genuinely superior and better at attack. But we are not in a ‘machine gun vs bow and arrow’ situation.
The Snowden leaks were a revelation not because the NSA had any sort of major unexpected breakthrough. They have not solved factoring. They do not have quantum computers. They have not made major progress on P=NP or reversing one-way functions. The most advanced stuff from all the Snowden leaks I’ve read was the amortized attack on common hardwired primes, but that again was something well known in the open literature and why we were able to figure it out from the hints in the leaks. In fact, the leaks strongly affirmed that the security community and crypto theory has reached parity with the NSA, that things like PGP were genuinely secure (as far as the crypto went...), and that there were no surprises like differential cryptanalysis waiting in the wings. This is great—except it doesn’t matter.
They were a revelation because they revealed how useless all of that parity was: the NSA simply attacked on the economic, business, political, and implementation planes. There is no need to beat PGP by factoring integers when you can simply tap into Gmail’s datacenters and read the emails decrypted. There is no need to worry overly much about OTR when your TAO teams divert shipments from Amazon, insert a little hardware keylogger, and record everything and exfiltrate out over DNS. Get something into a computer’s BIOS and it’ll never come out. You don’t need to worry much about academics coming up with better hash functions when your affiliated academics, who know what side their bread is buttered on, will quietly quash it in committee or ensure something like export-grade ciphers are included. You don’t need to worry about spending too much on deep cryptanalysis when the existence of C ensures that there will always be zero-days for you to exploit. You don’t need to worry about even revealing capabilities when you can just leak information to your buddies in the FBI or DEA and they will work their tails off to come up with a plausible non-digital story which they can feed the judge. (Your biggest problems, really, are figuring out how to not drown under the tsunami of data coming in at you from all the hacked communications links, subverted computers, bulk collections from cloud datacenters, decrypted VPNs etc.)
This isn’t like guns eliminating armor. This is like an army not bothering with sanitation and wondering why it keeps losing to the other guys, which turns out to be because the latrine contractors are giving kickbacks to the king’s brother.
I agree, it absolutely does, and it’s why I find kind of hilarious people who seem to seriously think that to do AI safety, you just need some nested VMs and some protocols. That’s not remotely close to the full scope of the problem. It does no good to come up with a secure sandbox if dozens of external pressures and incentives and cost-cutting and competition mean that the AI will be immediately let out of the box.
(The trend towards attention mechanisms and reinforcement learning in deep learning is an example of this: tool AI technologies want to become agent AIs, because that is how you get rid of expensive slow humans in the loop, make better inferences and decisions, and optimize exploration by deciding what data you need and what experiments to try.)
Eliezer has said that security mindset is similar, but not identical, to the mindset needed for AI design. https://www.facebook.com/yudkowsky/posts/10153833539264228?pnref=story
Well, what a relief!
A fair point, though that mindset is hacker-like in nature. It is, basically, an automatic “how can I break or subvert this system?” reaction to everything.
But the thing is, computer security is an intensely practical field. It’s very much like engineering: has to be realistic/implementable, bad things happen if it fucks up, people pay a lot of money to get good solutions, these solutions are often specific to the circumstances, etc.
AI safety research at the moment is very far from this.
Not quite. Computer security deals with managing risks coming from people, it’s just that the universe where it has to manage risks is a weird superposition of the physical world (see hardware or physical-access attacks), the social world (see social engineering attacks), and the cyberworld (see the usual ’sploit attacks).
I think many people intuitively distrust the idea that an AI could be intelligent enough to transform matter into paperclips in creative ways, but ‘not intelligent enough’ to understand its goals in a human and cultural context (i.e. to satisfy the needs of the business owners of the paperclip factory). This is often due to the confusion that the paperclip maximizer would get its goal function from parsing the sentence “make paperclips”, rather from a preprogrammed reward function, for example a CNN that is trained to map the number of paperclips in images to a scalar reward.
Could well be. Does that have anything to do with pattern-matching AI risk to SF, though?
Just speaking of weaknesses of the paperclip maximizer though experiment. I’ve seen this misunderstanding at least 4 out of 10 times that the thought experiment was brought up.
If you are just trying to communicate risk, analogy to a virus might be helpful in this respect. A natural virus can be thought of as code that has goals. If it harms humankind, it doesn’t ‘intend’ to, it is just a side effect of achieving its goals. We might create an artificial virus with a goal that everyone recognizes as beneficial (e.g., end malaria), but that does harm due to unexpected consequences or because the artificial virus evolves, self-modifying its original goal. Note that once a virus is released into the environment, it is nontrivial to ‘delete’ or ‘turn off’. AI will operate in an environment that is many times more complex: “mindspace”.