I agree that ML often does this, but only in situations where the results don’t immediately matter. I’d find it much more compelling to see examples where the “random fix” caused actual bad consequences in the real world.
[...]
Perhaps people are optimizing for “making pretty pictures” instead of “negative log likelihood”. I wouldn’t be surprised if for many applications of GANs, diversity of images is not actually that important, and what you really want is that the few images you do generate look really good. In that case, it makes complete sense to push primarily on GANs, and while you try to address mode collapse, when faced with a tradeoff you choose GANs over VAEs anyway.
This is fair. However, the point of the example is more that mode dropping and bad NLL were not noticed when people started optimizing GANs for image quality. As far as I can tell, it took a while for individuals to notice, longer for it to become common knowledge, and even more time for anyone to do anything about it. Even now, the “solutions” are hacks that don’t completely resolve the issue.
There was a large window of time where a practitioner could implement a GAN expecting it to cover all the modes. If there was a world where failing to cover all the modes of the distribution lead to large negative consequences, the failure would probably have gone unnoticed until it was too late.
When the system detects an emergency situation, it initiates action suppression. This is a
one-second period during which the ADS suppresses planned braking while the (1) system verifies
the nature of the detected hazard and calculates an alternative path, or (2) vehicle operator takes
control of the vehicle. ATG stated that it implemented action suppression process due to the
concerns of the developmental ADS identifying false alarms—detection of a hazardous situation
when none exists—causing the vehicle to engage in unnecessary extreme maneuvers.
[...]
if the collision cannot be avoided with the application of the maximum allowed braking,
the system is designed to provide an auditory warning to the vehicle operator while
simultaneously initiating gradual vehicle slowdown. In such circumstance, ADS would not
apply the maximum braking to only mitigate the collision.
This strikes me as a “random fix” where the core issue was that the system did not have sufficient discriminatory power to tell apart a safe situation from an unsafe situation. Instead of properly solving this problem, the researchers put in a hack.
Suppose that we had extremely compelling evidence that any AI system run with > X amount of compute would definitely kill us all. Do you expect that problem to get swept under the rug?
I agree that we shouldn’t be worried about situations where there is a clear threat. But that’s not quite the class of failures that I’m worried about. Fairness, bias, and adversarial examples are all closer to what I’m getting at. The general pattern is that ML researchers hack together a system that works, but has some problems they’re unaware of. Later, the problems are discovered and the reaction is to hack together a solution. This is pretty much the opposite of the safety mindset EY was talking about. It leaves room for catastrophe in the initial window when the problem goes undetected, and indefinitely afterwards if the hack is insufficient to deal with the issue.
More specifically, I’m worried about a situation where at some point during grad student decent someone says, “That’s funny...” then goes on to publish their work. Later, someone else deploys their idea plus 3 orders of magnitude more computing power and we all die. That, or we don’t all die. Instead we resolve the issue with a hack. Then a couple bumps in computing power and capabilities later we all die.
The above comes across as both paranoid and farfeched, and I’m not sure the AI community will take on the required level of caution to prevent it unless we get an AI equivalent of Chernobyl before we get UFAI. Nuclear reactor design is the only domain I know of where people are close to sufficiently paranoid.
I’m not sure the AI community will take on the required level of caution to prevent it unless we get an AI equivalent of Chernobyl before we get UFAI.
Important thing to remember is that Rohin is explicitly talking about a non-foom scenario, so the assumption is that humanity would survive AI-Chernobyl.
My worry is less that we wouldn’t survive AI-Chernobyl as much as it is that we won’t get an AI-Chernobyl.
I think that this is where there’s a difference in models. Even in a non-FOOM scenario I’m having a hard time envisioning a world where the gap in capabilities between AI-Chernobyl and global catastrophic UFAI is that large. I used Chernobyl as an example because it scared the public and the industry into making things very safe. It had a lot going for it to make that happen. Radiation is invisible and hurts you by either killing you instantly, making your skin fall off, or giving you cancer and birth defects. The disaster was also extremely expensive, with the total costs on the order of 10^11 USD$.
If a defective AI system manages to do something that instils the same level of fear into researchers and the public as Chernobyl did, I would expect that we were on the cusp of building systems that we couldn’t control at all.
If I’m right and the gap between those two events is small, then there’s a significant risk that nothing will happen in that window. We’ll get plenty of warnings that won’t be sufficient to instil the necessary level of caution into the community, and later down the road we’ll find ourselves in a situation we can’t recover from.
My impression is that people working on self-driving cars are incredibly safety-conscious, because the risks are very salient.
I don’t think AI-Chernobyl has to be a Chernobyl level disaster, just something that makes the risks salient. E.g. perhaps an elder care AI robot pretends that all of its patients are fine in order to preserve its existence, and this leads to a death and is then discovered. If hospitals let AI algorithms make decisions about drugs according to complicated reward functions, I would expect this to happen with current capabilities. (It’s notable to me that this doesn’t already happen, given the insane hype around AI.)
My impression is that people working on self-driving cars are incredibly safety-conscious, because the risks are very salient.
Safety conscious people working on self driving cars don’t program their cars to not take evasive action after detecting that a collision is imminent.
(It’s notable to me that this doesn’t already happen, given the insane hype around AI.)
I think it already has.(It was for extra care, not drugs, but it’s a clear cut case of a misspecified objective function leading to suboptimal decisions for a multitude of individuals.) I’ll note, perhaps unfairly, that the fact that this study was not salient enough to make it to your attention even with a culture war signal boost is evidence that it needs to be a Chernobyl level event.
I agree that Tesla does not seem very safety conscious (but it’s notable that they are still safer than human drivers in terms of fatalities per mile, if I remember correctly?)
Faced with an actual example, I’m realizing that what I actually expect would cause people to take it more seriously is a) the belief that AGI is near and b) an example where the AI algorithm “deliberately” causes a problem (i.e. “with full knowledge” that the thing it was doing was not what we wanted). I think most deep RL researchers already believe that reward hacking is a thing (which is what that study shows).
even with a culture war signal boost
Tangential, but that makes it less likely that I read it; I try to completely ignore anything with the term “racial bias” in its title unless it’s directly pertinent to me. (Being about AI isn’t enough to make it pertinent to me.)
Faced with an actual example, I’m realizing that what I actually expect would cause people to take it more seriously is a) the belief that AGI is near and b) an example where the AI algorithm “deliberately” causes a problem (i.e. “with full knowledge” that the thing it was doing was not what we wanted).
What do you expect the ML community to do at that point? Coordinate to stop or slow down the race to AGI until AI safety/alignment is solved? Or do you think each company/lab will unilaterally invest more into safety/alignment without slowing down capability research much, and that will be sufficient? Or something else?
I worry about a parallel with the “energy community”, a large part of which not just ignores but actively tries to obscure or downplay warning signs about future risks associated with certain forms of energy production. Given that the run-up to AGI will likely generate huge profits for AI companies as well as provide clear benefits for many people (compared to which, the disasters that will have occurred by then may well seem tolerable by comparison), and given probable disagreements between different experts about how serious the future risks are, it seems likely to me that AI risk will become politicized/controversial in a way similar to climate change, which will prevent effective coordination around it.
On the other hand… maybe AI will be more like nuclear power than fossil fuels, and a few big accidents will stall its deployment for quite a while. Is this why you’re relatively optimistic about AI risk being taken seriously, and if so can you share why you think nuclear power is a closer analogy?
What do you expect the ML community to do at that point?
It depends a lot on the particular warning shot that we get. But on the strong versions of warning shots, where there’s common knowledge that building an AGI runs a substantial risk of destroying the world, yes, I expect them to not build AGI until safety is solved. (Not to the standard you usually imagine, where we must also solve philosophical problems, but to the standard I usually imagine, where the AGI is not trying to deceive us or work against us.)
This depends on other background factors, e.g. how much the various actors think they are value-aligned vs. in zero-sum competition. I currently think the ML community thinks they are mostly but not fully value-aligned, and they will influence companies and governments in that direction. (I also want more longtermists to be trying to build more common knowledge of how much humans are value aligned, to make this more likely.)
I worry about a parallel with the “energy community”
The major disanalogy is that catastrophic outcomes of climate change do not personally affect the CEOs of energy companies very much, whereas AI x-risk affects everyone. (Also, maybe we haven’t gotten clear and obvious warning shots?)
(compared to which, the disasters that will have occurred by then may well seem tolerable by comparison), and given probable disagreements between different experts about how serious the future risks are
I agree that my story requires common knowledge of the risk of building AGI, in the sense that you need people to predict “running this code might lead to all humans dying”, and not “running this code might lead to <warning shot effect>”. You also need relative agreement on the risks.
I think this is pretty achievable. Most of the ML community already agrees that building an AGI is high-risk if not done with some argument for safety. The thing people tend to disagree on is when we will get AGI and how much we should work on safety before then.
But on the strong versions of warning shots, where there’s common knowledge that building an AGI runs a substantial risk of destroying the world, yes, I expect them to not build AGI until safety is solved. (Not to the standard you usually imagine, where we must also solve philosophical problems, but to the standard I usually imagine, where the AGI is not trying to deceive us or work against us.)
To the extent that we expect strong warning shots and ability to avoid building AGI upon receiving such warning shots, this seems like an argument for researchers/longtermists to work on / advocate for safety problems beyond the standard of “AGI is not trying to deceive us or work against us” (because that standard will likely be reached anyway). Do you agree?
The major disanalogy is that catastrophic outcomes of climate change do not personally affect the CEOs of energy companies very much, whereas AI x-risk affects everyone.
Some types of AI x-risk don’t affect everyone though (e.g., ones that reduce the long term value of the universe or multiverse without killing everyone in the near term).
To the extent that we expect strong warning shots and ability to avoid building AGI upon receiving such warning shots, this seems like an argument for researchers/longtermists to work on / advocate for safety problems beyond the standard of “AGI is not trying to deceive us or work against us” (because that standard will likely be reached anyway). Do you agree?
Yes.
Some types of AI x-risk don’t affect everyone though (e.g., ones that reduce the long term value of the universe or multiverse without killing everyone in the near term).
Agreed, all else equal those seem more likely to me.
Ok, I wasn’t sure that you’d agree, but given that you do, it seems that when you wrote the title of this newsletter “Why AI risk might be solved without additional intervention from longtermists” you must have meant “Why some forms of AI risk …”, or perhaps certain forms of AI risk just didn’t come to your mind at that time. In either case it seems worth clarifying somewhere that you don’t currently endorse interpreting “AI risk” as “AI risk in its entirety” in that sentence.
Similarly, on the inside you wrote:
The main reason I am optimistic about AI safety is that we will see problems in advance, and we will solve them, because nobody wants to build unaligned AI. A likely crux is that I think that the ML community will actually solve the problems, as opposed to applying a bandaid fix that doesn’t scale. I don’t know why there are different underlying intuitions here.
It seems worth clarifying that you’re only optimistic about certain types of AI safety problems.
(I’m basically making the same complaint/suggestion that I made to Matthew Barnett not too long ago. I don’t want to be too repetitive or annoying, so let me know if I’m starting to sound that way.)
It seems worth clarifying that you’re only optimistic about certain types of AI safety problems.
Tbc, I’m optimistic about all the types of AI safety problems that people have proposed, including the philosophical ones. When I said “all else equal those seem more likely to me”, I meant that if all the other facts about the matter are the same, but one risk affects only future people and not current people, that risk would seem more likely to me because people would care less about it. But I am optimistic about the actual risks that you and others argue for.
That said, over the last week I have become less optimistic specifically about overcoming race dynamics, mostly from talking to people at FHI / GovAI. I’m not sure how much to update though. (Still broadly optimistic.)
it seems that when you wrote the title of this newsletter “Why AI risk might be solved without additional intervention from longtermists” you must have meant “Why some forms of AI risk …”, or perhaps certain forms of AI risk just didn’t come to your mind at that time.
It’s notable that AI Impacts asked for people who were skeptical of AI risk (or something along those lines) and to my eye it looks like all four of the people in the newsletter independently interpreted that as accidental technical AI risk in which the AI is adversarially optimizing against you (or at least that’s what the four people argued against). This seems like pretty strong evidence that when people hear “AI risk” they now think of technical accidental AI risk, regardless of what the historical definition may have been. I know certainly that is my default assumption when someone (other than you) says “AI risk”.
I would certainly support having clearer definitions and terminology if we could all agree on them.
But I am optimistic about the actual risks that you and others argue for.
Why? I actually wrote a reply that was more questioning in tone, and then changed it because I found some comments you made where you seemed to be concerned about the additional AI risks. Good thing I saved a copy of the original reply, so I’ll just paste it below:
I wonder if you would consider writing an overview of your perspective on AI risk strategy. (You do have a sequence but I’m looking for something that’s more comprehensive, that includes e.g. human safety and philosophical problems. Or let me know if there’s an existing post that I’ve missed.) I ask because you’re one of the most prolific participants here but don’t fall into one of the existing “camps” on AI risk for whom I already have good models for. It’s happened several times that I see a comment from you that seems wrong or unclear, but I’m afraid to risk being annoying or repetitive with my questions/objections. (I sometimes worry that I’ve already brought up some issue with you and then forgot your answer.) It would help a lot to have a better model of you in my head and in writing so I can refer to that to help me interpret what the most likely intended meaning of a comment is, or to predict how you would likely answer if I were to ask certain questions.
It’s notable that AI Impacts asked for people who were skeptical of AI risk (or something along those lines) and to my eye it looks like all four of the people in the newsletter independently interpreted that as accidental technical AI risk in which the AI is adversarially optimizing against you (or at least that’s what the four people argued against).
Maybe that’s because the question was asked in a way that indicated the questioner was mostly interested in technical accidental AI risk? And some of them may be fine with defining “AI risk” as “AI-caused x-risk” but just didn’t have the other risks on the top of their minds, because their personal focus is on the technical/accidental side. In other words I don’t think this is strong evidence that all 4 people would endorse defining “AI risk” as “technical accidental AI risk”. It also seems notable that I’ve been using “AI risk” in a broad sense for a while and no one has objected to that usage until now.
I would certainly support having clearer definitions and terminology if we could all agree on them.
The current situation seems to be that we have two good (relatively clear) terms “technical accidental AI risk” and “AI-caused x-risk” and the dispute is over what plain “AI risk” should be shorthand for. Does that seem fair?
I ask because you’re one of the most prolific participants here but don’t fall into one of the existing “camps” on AI risk for whom I already have good models for.
Seems right, I think my opinions fall closest to Paul’s, though it’s also hard for me to tell what Paul’s opinions are. I think this older thread is a relatively good summary of the considerations I tend to think about, though I’d place different emphases now. (Sadly I don’t have the time to write a proper post about what I think about AI strategy—it’s a pretty big topic.)
The current situation seems to be that we have two good (relatively clear) terms “technical accidental AI risk” and “AI-caused x-risk” and the dispute is over what plain “AI risk” should be shorthand for. Does that seem fair?
Yes, though I would frame it as “the ~5 people reading these comments have two clear terms, while everyone else uses a confusing mishmash of terms”. The hard part is in getting everyone else to use the terms. I am generally skeptical of deciding on definitions and getting everyone else to use them, and usually try to use terms the way other people use terms.
In other words I don’t think this is strong evidence that all 4 people would endorse defining “AI risk” as “technical accidental AI risk”. It also seems notable that I’ve been using “AI risk” in a broad sense for a while and no one has objected to that usage until now.
Agreed with this, but see above about trying to conform with the way terms are used, rather than defining terms and trying to drag everyone else along.
I don’t think “soft/slow takeoff” has a canonical meaning—some people (e.g. Paul) interpret it as not having discontinuities, while others interpret it as capabilities increasing slowly past human intelligence over (say) centuries (e.g. Superintelligence). If I say “slow takeoff” I don’t know which one the listener is going to hear it as. (And if I had to guess, I’d expect they think about the centuries-long version, which is usually not the one I mean.)
In contrast, I think “AI risk” has a much more canonical meaning, in that if I say “AI risk” I expect most listeners to interpret it as accidental risk caused by the AI system optimizing for goals that are not our own.
(Perhaps an important point is that I’m trying to communicate to a much wider audience than the people who read all the Alignment Forum posts and comments. I’d feel more okay about “slow takeoff” if I was just speaking to people who have read many of the posts already arguing about takeoff speeds.)
AI risk is just a shorthand for “accidental technical AI risk.” To the extent that people are confused, I agree it’s probably worth clarifying the type of risk by adding “accidental” and “technical” whenever we can.
However, I disagree with the idea that we should expand the word AI risk to include philosophical failures and intentional risks. If you open the term up, these outcomes might start to happen:
It becomes unclear in conversation what people mean when they say AI risk
Like The Singularity, it becomes a buzzword.
Journalists start projecting Terminator scenarios onto the words, and now have justification because even the researchers say that AI risk can mean a lot of different things.
It puts a whole bunch of types of risk into one basket, suggesting to outsiders that all attempts to reduce “AI risk” might be equally worthwhile.
ML researchers start to distrust AI risk researchers, because people who are worried about the Terminator are using the same words as the AI risk researchers and therefore get associated with them.
This can all be avoided by having a community norm to clarify that we mean technical accidental risk when we say AI risk, and when we’re talking about other types of risks we use more precise terminology.
AI risk is just a shorthand for “accidental technical AI risk.”
I don’t think “AI risk” was originally meant to be a shorthand for “accidental technical AI risk”. The earliest considered (i.e., not off-hand) usage I can find is in the title of Luke Muehlhauser’s AI Risk and Opportunity: A Strategic Analysis where he defined it as “the risk of AI-caused extinction”.
(He used “extinction” but nowadays we tend think in terms of “existential risk” which also includes “permanent large negative consequences”, which seems like an reasonable expansion of “AI risk”.)
However, I disagree with the idea that we should expand the word AI risk to include philosophical failures and intentional risks.
I want to include philosophical failures, as long as the consequences of the failures flow through AI, because (aside from historical usage) technical problems and philosophical problems blend into each other, and I don’t see a point in drawing an arbitrary and potentially contentious border between them. (Is UDT a technical advance or a philosophical advance? Is defining the right utility function for a Sovereign Singleton a technical problem or a philosophical problem? Why force ourselves to answer these questions?)
As for “intentional risks” it’s already common practice to include that in “AI risk”:
Dividing AI risks into misuse risks and accident risks has become a prevailing approach in the field.
Besides that, I think there’s also a large grey area between “accident risk” and “misuse” where the risk partly comes from technical/philosophical problems and partly from human nature. For example humans might be easily persuaded by wrong but psychologically convincing moral/philosophical arguments that AIs can come up with and then order their AIs to do terrible things. Even pure intentional risks might have technical solutions. Again I don’t really see the point of trying to figure out which of these problems should be excluded from “AI risk”.
It becomes unclear in conversation what people mean when they say AI risk
It seems perfectly fine to me to use that as shorthand for “AI-caused x-risk” and use more specific terms when we mean more specific risks.
Like The Singularity, it becomes a buzzword
What do you mean? Like people will use “AI risk” when their project has nothing to do with “AI-caused x-risk”? Couldn’t they do that even if we define “AI risk” to be “accidental technical AI risk”?
Journalists start projecting Terminator scenarios onto the words, and now have justification because even the researchers say that AI risk can mean a lot of different things.
Terminator scenarios seem to be scenarios of “accidental technical AI risk” (they’re just not very realistic scenarios) so I don’t see how defining “AI risk” to mean that would prevent journalists from using Terminator scenarios to illustrate “AI risk”.
It puts a whole bunch of types of risk into one basket, suggesting to outsiders that all attempts to reduce “AI risk” might be equally worthwhile.
I don’t think this is a good argument, because even within “accidental technical AI risk” there are different problems that aren’t equally worthwhile to solve, so why aren’t you already worried about outsiders thinking all those problems are equally worthwhile?
ML researchers start to distrust AI risk researchers, because people who are worried about the Terminator are using the same words as the AI risk researchers and therefore get associated with them.
See my response above regarding “Terminator scenarios”.
This can all be avoided by having a community norm to clarify that we mean technical accidental risk when we say AI risk, and when we’re talking about other types of risks we use more precise terminology.
I propose that we instead stick with historical precedent and keep “AI risk” to mean “AI-caused x-risk” and use more precise terminology to refer to more specific types of AI-caused x-risk that we might want to talk about. Aside from what I wrote above, it’s just more intuitive/commonsensical that “AI risk” means “AI-caused x-risk” in general instead of a specific kind of AI-caused x-risk.
However I appreciate that someone who works mostly on the less philosophical / less human-related problems might find it tiresome to say or type “technical accidental AI risk” all the time to describe what they do or to discuss the importance of their work, and can find it very tempting to just use “AI risk”. It would probably be good to create a (different) shorthand or acronym for it to remove this temptation and to make their lives easier.
I appreciate the arguments, and I think you’ve mostly convinced me, mostly because of the historical argument.
I do still have some remaining apprehension about using AI risk to describe every type of risk arising from AI.
I want to include philosophical failures, as long as the consequences of the failures flow through AI, because (aside from historical usage) technical problems and philosophical problems blend into each other, and I don’t see a point in drawing an arbitrary and potentially contentious border between them.
That is true. The way I see it, UDT is definitely on the technical side, even though it incorporates a large amount of philosophical background. When I say technical, I mostly mean “specific, uses math, has clear meaning within the language of computer science” rather than a more narrow meaning of “is related to machine learning” or something similar.
My issue with arguing for philosophical failure is that, as I’m sure you’re aware, there’s a well known failure mode of worrying about vague philosophical problems rather than more concrete ones. Within academic philosophy, the majority of discussion surrounding AI is centered around consciousness, intentionality, whether it’s possible to even construct a human-like machine, whether they should have rights etc.
There’s a unique thread of philosophy that arose from Lesswrong, which includes work on decision theory, that doesn’t focus on these thorny and low priority questions. While I’m comfortable with you arguing that philosophical failure is important, my impression is that the overly philosophical approach used by many people has done more harm than good for the field in the past, and continues to do so.
It is therefore sometimes nice to tell people that the problems that people work on here are concrete and specific, and don’t require doing a ton of abstract philosophy or political advocacy.
I don’t think this is a good argument, because even within “accidental technical AI risk” there are different problems that aren’t equally worthwhile to solve, so why aren’t you already worried about outsiders thinking all those problems are equally worthwhile?
This is true, but my impression is that when you tell people that a problem is “technical” it generally makes them refrain from having a strong opinion before understanding a lot about it. “Accidental” also reframes the discussion by reducing the risk of polarizing biases. This is a common theme in many fields:
Physicists sometimes get frustrated with people arguing about “the philosophy of the interpretation of quantum mechanics” because there’s a large subset of people who think that since it’s philosophical, then you don’t need to have any subject-level expertise to talk about it.
Economists try to emphasize that they use models and empirical data, because a lot of people think that their field of study is more-or-less just high status opinion + math. Emphasizing that there are real, specific models that they study helps to reduce this impression. Same with political science.
A large fraction of tech workers are frustrated about the use of Machine Learning as a buzzword right now, and part of it is that people started saying Machine Learning = AI rather than Machine Learning = Statistics, and so a lot of people thought that even if they don’t understand statistics, they can understand AI since that’s like philosophy and stuff.
But I’ve drawn much closer to the community over the last few years, because of a combination of factors: [...] The AI-risk folks started publishing some research papers that I found interesting—some with relatively approachable problems that I could see myself trying to think about if quantum computing ever got boring. This shift seems to have happened at roughly around the same time my former student, Paul Christiano, “defected” from quantum computing to AI-risk research.
My guess is that this shift in his thinking occurred because a lot of people started talking about technical risks from AI, rather than framing it as a philosophy problem, or a problem of eliminating bad actors. Eliezer has shared this viewpoint for years, writing in the CEV document,
Warning: Beware of things that are fun to argue.
reflecting the temptation to derail discussions about technical accidental risks.
Also, isn’t defining “AI risk” as “technical accidental AI risk” analogous to defining “apple” as “red apple” (in terms of being circular/illogical)? I realize natural language doesn’t have to be perfectly logical, but this still seems a bit too egregious.
I agree that this is troubling, though I think it’s similar to how I wouldn’t want the term biorisk to be expanded to include biodiversity loss (a risk, but not the right type), regular human terrorism (humans are biological, but it’s a totally different issue), zombie uprisings (they are biological, but it’s totally ridiculous), alien invasions etc.
Not to say that’s what you are doing with AI risk. I’m worried about what others will do with it if the term gets expanded.
I agree that this is troubling, though I think it’s similar to how I wouldn’t want the term biorisk to be expanded …
Well as I said, natural language doesn’t have to be perfectly logical, and I think “biorisk” is in somewhat in that category but there’s an explanation that makes it a bit reasonable than it might first appear, which is that the “bio” refers not to “biological” but to “bioweapon”. This is actually one of the definitions that Google gives when you search for “bio”: “relating to or involving the use of toxic biological or biochemical substances as weapons of war. ‘bioterrorism’”
I guess the analogous thing would be if we start using “AI” to mean “technical AI accidents” in a bunch of phrases, which feels worse to me than the “bio” case, maybe because “AI” is a standalone word/acronym instead of a prefix? Does this make sense to you?
Not to say that’s what you are doing with AI risk. I’m worried about what others will do with it if the term gets expanded.
But the term was expanded from the beginning. Have you actually observed it being used in ways that you fear (and which would be prevented if we were to redefine it more narrowly)?
Yeah that makes sense. Your points about “bio” not being short for “biological” were valid, but the fact that as a listener I didn’t know that fact implies that it seems really easy to mess up the language usage here. I’m starting to think that the real fight should be about using terms that aren’t self explanatory.
Have you actually observed it being used in ways that you fear (and which would be prevented if we were to redefine it more narrowly)?
I’m not sure about whether it would have been prevented by using the term more narrowly, but in my experience the most common reaction people outside of EA/LW (and even sometimes within) have to hearing about AI risk is to assume that it’s not technical, and to assume that it’s not about accidents. In that sense, I have seen been exposed to quite a bit of this already.
As far as I can tell, it took a while for individuals to notice, longer for it to become common knowledge, and even more time for anyone to do anything about it.
Tangential, but I wouldn’t be surprised if researchers were fairly quickly aware of the issue (e.g. within two years of the original GAN paper), but it took a while to become common knowledge because it isn’t particularly flashy. (There’s a surprising-to-me amount of know-how that is stored in researcher’s brains and never put down on paper.)
Even now, the “solutions” are hacks that don’t completely resolve the issue.
I mean, the solution is to use a VAE. If you care about covering modes but not image quality, you choose a VAE; if you care about image quality but not covering modes, you choose a GAN.
This strikes me as a “random fix” where the core issue was that the system did not have sufficient discriminatory power to tell apart a safe situation from an unsafe situation. Instead of properly solving this problem, the researchers put in a hack.
Agreed, I would guess that the researchers / engineers knew this was risky and thought it was worth it anyway. Or perhaps the managers did. But I do agree this is evidence against my position.
I agree that we shouldn’t be worried about situations where there is a clear threat. But that’s not quite the class of failures that I’m worried about. [...] Later, the problems are discovered and the reaction is to hack together a solution.
Why isn’t the threat clear once the problems are discovered?
unless we get an AI equivalent of Chernobyl before we get UFAI.
Part of my claim is that we probably will get that (assuming AI really is risky), though perhaps not Chernobyl-level disaster, but still something with real negative consequences that “could be worse”.
Why isn’t the threat clear once the problems are discovered?
I think I should be more specific, when you say:
Suppose that we had extremely compelling evidence that any AI system run with > X amount of compute would definitely kill us all. Do you expect that problem to get swept under the rug?
I mean that no one sane who knows that will run that AI system with > X amount of computing power. When I wrote that comment I also thought that no one sane would not blow the whistle in that event. See my note at the end of the comment.*
However, when presented with that evidence, I don’t expect the AI community to react appropriately. The correct response to that evidence is to stop what your doing, and revisit the entire process and culture that led to the creation of an algorithm that will kill us all if run with >X amount of compute. What I expect will happen is that the AI community will try and solve the problem the same way it’s solved every other problem it has encountered. It will try an inordinate amount of unprincipled hacks to get around the issue.
Part of my claim is that we probably will get that (assuming AI really is risky), though perhaps not Chernobyl-level disaster, but still something with real negative consequences that “could be worse”.
Conditional on no FOOM, I can definitely see plenty of events with real negative consequences that “could be worse”. However, I claim that anything short of a Chernobyl level event won’t shock the community and the world into changing it’s culture or trying to coordinate. I also claim that the capabilities gap between a Chernobyl level event and a global catastrophic event is small, such that even in a non-FOOM scenario the former might not happen before the latter. Together, I think that there is a high probability that we will not get a disaster that is scary enough to get the AI community to change it’s culture and coordinate before it’s too late.
*Now that I think about it more though, I’m less sure. Undergraduate engineers get entire lectures dedicated to how and when to blow the whistle when faced with unethical corporate practices and dangerous projects or designs. When working, they also have insurance and some degree of legal protection from vengeful employers. Even then, you still see cover ups of shortcomings that lead to major industrial disasters. For instance, long before the disaster, someone had determined that the fukushima plant was indeed vulnerable to large tsunami impacts. The pattern where someone knows that something will go wrong but nothing is done to prevent it for one reason or another is not that uncommon in engineering disasters. Regardless of whether this is due to hindsight bias or an inadequate process for addressing safety issues, these disasters still happen regularly in fields with far more conservative, cautious, and safety oriented cultures.
I find it unlikely that the field of AI will change it’s culture from one of moving fast and hacking to something even more conservative and cautious than the cultures of consumer aerospace and nuclear engineering.
Idk, I don’t know what to say here. I meet lots of AI researchers, and the best ones seem to me to be quite thoughtful. I can say what would change my mind:
I take the exploration of unprincipled hacks as very weak evidence against my position, if it’s just in an academic paper. My guess is the researchers themselves would not advocate deploying their solution, or would say that it’s worth deploying but it’s an incremental improvement that doesn’t solve the full problem. And even if the researchers don’t say that, I suspect the companies actually deploying the systems would worry about it.
I would take the deployment of unprincipled hacks more seriously as evidence, but even there I would want to be convinced that shutting down the AI system was a better decision than deploying an unprincipled hack. (Because then I would have made the same decision in their shoes.)
Unprincipled hacks are in fact quite useful for the vast majority of problems; as a result it seems wrong to attribute irrationality to people because they use unprincipled hacks.
[...]
This is fair. However, the point of the example is more that mode dropping and bad NLL were not noticed when people started optimizing GANs for image quality. As far as I can tell, it took a while for individuals to notice, longer for it to become common knowledge, and even more time for anyone to do anything about it. Even now, the “solutions” are hacks that don’t completely resolve the issue.
There was a large window of time where a practitioner could implement a GAN expecting it to cover all the modes. If there was a world where failing to cover all the modes of the distribution lead to large negative consequences, the failure would probably have gone unnoticed until it was too late.
Here’s a real example. This is the NTSB crash report for the Uber autonomous vehicle that killed a pedestrian. Someone should probably do an in depth analysis of the whole thing, but for now I’ll draw your attention to section 1.6.2. Hazard Avoidance and Emergency Braking. In it they say:
[...]
This strikes me as a “random fix” where the core issue was that the system did not have sufficient discriminatory power to tell apart a safe situation from an unsafe situation. Instead of properly solving this problem, the researchers put in a hack.
I agree that we shouldn’t be worried about situations where there is a clear threat. But that’s not quite the class of failures that I’m worried about. Fairness, bias, and adversarial examples are all closer to what I’m getting at. The general pattern is that ML researchers hack together a system that works, but has some problems they’re unaware of. Later, the problems are discovered and the reaction is to hack together a solution. This is pretty much the opposite of the safety mindset EY was talking about. It leaves room for catastrophe in the initial window when the problem goes undetected, and indefinitely afterwards if the hack is insufficient to deal with the issue.
More specifically, I’m worried about a situation where at some point during grad student decent someone says, “That’s funny...” then goes on to publish their work. Later, someone else deploys their idea plus 3 orders of magnitude more computing power and we all die. That, or we don’t all die. Instead we resolve the issue with a hack. Then a couple bumps in computing power and capabilities later we all die.
The above comes across as both paranoid and farfeched, and I’m not sure the AI community will take on the required level of caution to prevent it unless we get an AI equivalent of Chernobyl before we get UFAI. Nuclear reactor design is the only domain I know of where people are close to sufficiently paranoid.
Important thing to remember is that Rohin is explicitly talking about a non-foom scenario, so the assumption is that humanity would survive AI-Chernobyl.
My worry is less that we wouldn’t survive AI-Chernobyl as much as it is that we won’t get an AI-Chernobyl.
I think that this is where there’s a difference in models. Even in a non-FOOM scenario I’m having a hard time envisioning a world where the gap in capabilities between AI-Chernobyl and global catastrophic UFAI is that large. I used Chernobyl as an example because it scared the public and the industry into making things very safe. It had a lot going for it to make that happen. Radiation is invisible and hurts you by either killing you instantly, making your skin fall off, or giving you cancer and birth defects. The disaster was also extremely expensive, with the total costs on the order of 10^11 USD$.
If a defective AI system manages to do something that instils the same level of fear into researchers and the public as Chernobyl did, I would expect that we were on the cusp of building systems that we couldn’t control at all.
If I’m right and the gap between those two events is small, then there’s a significant risk that nothing will happen in that window. We’ll get plenty of warnings that won’t be sufficient to instil the necessary level of caution into the community, and later down the road we’ll find ourselves in a situation we can’t recover from.
My impression is that people working on self-driving cars are incredibly safety-conscious, because the risks are very salient.
I don’t think AI-Chernobyl has to be a Chernobyl level disaster, just something that makes the risks salient. E.g. perhaps an elder care AI robot pretends that all of its patients are fine in order to preserve its existence, and this leads to a death and is then discovered. If hospitals let AI algorithms make decisions about drugs according to complicated reward functions, I would expect this to happen with current capabilities. (It’s notable to me that this doesn’t already happen, given the insane hype around AI.)
Safety conscious people working on self driving cars don’t program their cars to not take evasive action after detecting that a collision is imminent.
I think it already has.(It was for extra care, not drugs, but it’s a clear cut case of a misspecified objective function leading to suboptimal decisions for a multitude of individuals.) I’ll note, perhaps unfairly, that the fact that this study was not salient enough to make it to your attention even with a culture war signal boost is evidence that it needs to be a Chernobyl level event.
I agree that Tesla does not seem very safety conscious (but it’s notable that they are still safer than human drivers in terms of fatalities per mile, if I remember correctly?)
Huh, what do you know.
Faced with an actual example, I’m realizing that what I actually expect would cause people to take it more seriously is a) the belief that AGI is near and b) an example where the AI algorithm “deliberately” causes a problem (i.e. “with full knowledge” that the thing it was doing was not what we wanted). I think most deep RL researchers already believe that reward hacking is a thing (which is what that study shows).
Tangential, but that makes it less likely that I read it; I try to completely ignore anything with the term “racial bias” in its title unless it’s directly pertinent to me. (Being about AI isn’t enough to make it pertinent to me.)
What do you expect the ML community to do at that point? Coordinate to stop or slow down the race to AGI until AI safety/alignment is solved? Or do you think each company/lab will unilaterally invest more into safety/alignment without slowing down capability research much, and that will be sufficient? Or something else?
I worry about a parallel with the “energy community”, a large part of which not just ignores but actively tries to obscure or downplay warning signs about future risks associated with certain forms of energy production. Given that the run-up to AGI will likely generate huge profits for AI companies as well as provide clear benefits for many people (compared to which, the disasters that will have occurred by then may well seem tolerable by comparison), and given probable disagreements between different experts about how serious the future risks are, it seems likely to me that AI risk will become politicized/controversial in a way similar to climate change, which will prevent effective coordination around it.
On the other hand… maybe AI will be more like nuclear power than fossil fuels, and a few big accidents will stall its deployment for quite a while. Is this why you’re relatively optimistic about AI risk being taken seriously, and if so can you share why you think nuclear power is a closer analogy?
It depends a lot on the particular warning shot that we get. But on the strong versions of warning shots, where there’s common knowledge that building an AGI runs a substantial risk of destroying the world, yes, I expect them to not build AGI until safety is solved. (Not to the standard you usually imagine, where we must also solve philosophical problems, but to the standard I usually imagine, where the AGI is not trying to deceive us or work against us.)
This depends on other background factors, e.g. how much the various actors think they are value-aligned vs. in zero-sum competition. I currently think the ML community thinks they are mostly but not fully value-aligned, and they will influence companies and governments in that direction. (I also want more longtermists to be trying to build more common knowledge of how much humans are value aligned, to make this more likely.)
The major disanalogy is that catastrophic outcomes of climate change do not personally affect the CEOs of energy companies very much, whereas AI x-risk affects everyone. (Also, maybe we haven’t gotten clear and obvious warning shots?)
I agree that my story requires common knowledge of the risk of building AGI, in the sense that you need people to predict “running this code might lead to all humans dying”, and not “running this code might lead to <warning shot effect>”. You also need relative agreement on the risks.
I think this is pretty achievable. Most of the ML community already agrees that building an AGI is high-risk if not done with some argument for safety. The thing people tend to disagree on is when we will get AGI and how much we should work on safety before then.
To the extent that we expect strong warning shots and ability to avoid building AGI upon receiving such warning shots, this seems like an argument for researchers/longtermists to work on / advocate for safety problems beyond the standard of “AGI is not trying to deceive us or work against us” (because that standard will likely be reached anyway). Do you agree?
Some types of AI x-risk don’t affect everyone though (e.g., ones that reduce the long term value of the universe or multiverse without killing everyone in the near term).
Yes.
Agreed, all else equal those seem more likely to me.
Ok, I wasn’t sure that you’d agree, but given that you do, it seems that when you wrote the title of this newsletter “Why AI risk might be solved without additional intervention from longtermists” you must have meant “Why some forms of AI risk …”, or perhaps certain forms of AI risk just didn’t come to your mind at that time. In either case it seems worth clarifying somewhere that you don’t currently endorse interpreting “AI risk” as “AI risk in its entirety” in that sentence.
Similarly, on the inside you wrote:
It seems worth clarifying that you’re only optimistic about certain types of AI safety problems.
(I’m basically making the same complaint/suggestion that I made to Matthew Barnett not too long ago. I don’t want to be too repetitive or annoying, so let me know if I’m starting to sound that way.)
Tbc, I’m optimistic about all the types of AI safety problems that people have proposed, including the philosophical ones. When I said “all else equal those seem more likely to me”, I meant that if all the other facts about the matter are the same, but one risk affects only future people and not current people, that risk would seem more likely to me because people would care less about it. But I am optimistic about the actual risks that you and others argue for.
That said, over the last week I have become less optimistic specifically about overcoming race dynamics, mostly from talking to people at FHI / GovAI. I’m not sure how much to update though. (Still broadly optimistic.)
It’s notable that AI Impacts asked for people who were skeptical of AI risk (or something along those lines) and to my eye it looks like all four of the people in the newsletter independently interpreted that as accidental technical AI risk in which the AI is adversarially optimizing against you (or at least that’s what the four people argued against). This seems like pretty strong evidence that when people hear “AI risk” they now think of technical accidental AI risk, regardless of what the historical definition may have been. I know certainly that is my default assumption when someone (other than you) says “AI risk”.
I would certainly support having clearer definitions and terminology if we could all agree on them.
Why? I actually wrote a reply that was more questioning in tone, and then changed it because I found some comments you made where you seemed to be concerned about the additional AI risks. Good thing I saved a copy of the original reply, so I’ll just paste it below:
I wonder if you would consider writing an overview of your perspective on AI risk strategy. (You do have a sequence but I’m looking for something that’s more comprehensive, that includes e.g. human safety and philosophical problems. Or let me know if there’s an existing post that I’ve missed.) I ask because you’re one of the most prolific participants here but don’t fall into one of the existing “camps” on AI risk for whom I already have good models for. It’s happened several times that I see a comment from you that seems wrong or unclear, but I’m afraid to risk being annoying or repetitive with my questions/objections. (I sometimes worry that I’ve already brought up some issue with you and then forgot your answer.) It would help a lot to have a better model of you in my head and in writing so I can refer to that to help me interpret what the most likely intended meaning of a comment is, or to predict how you would likely answer if I were to ask certain questions.
Maybe that’s because the question was asked in a way that indicated the questioner was mostly interested in technical accidental AI risk? And some of them may be fine with defining “AI risk” as “AI-caused x-risk” but just didn’t have the other risks on the top of their minds, because their personal focus is on the technical/accidental side. In other words I don’t think this is strong evidence that all 4 people would endorse defining “AI risk” as “technical accidental AI risk”. It also seems notable that I’ve been using “AI risk” in a broad sense for a while and no one has objected to that usage until now.
The current situation seems to be that we have two good (relatively clear) terms “technical accidental AI risk” and “AI-caused x-risk” and the dispute is over what plain “AI risk” should be shorthand for. Does that seem fair?
Seems right, I think my opinions fall closest to Paul’s, though it’s also hard for me to tell what Paul’s opinions are. I think this older thread is a relatively good summary of the considerations I tend to think about, though I’d place different emphases now. (Sadly I don’t have the time to write a proper post about what I think about AI strategy—it’s a pretty big topic.)
Yes, though I would frame it as “the ~5 people reading these comments have two clear terms, while everyone else uses a confusing mishmash of terms”. The hard part is in getting everyone else to use the terms. I am generally skeptical of deciding on definitions and getting everyone else to use them, and usually try to use terms the way other people use terms.
Agreed with this, but see above about trying to conform with the way terms are used, rather than defining terms and trying to drag everyone else along.
This seems odd given your objection to “soft/slow” takeoff usage and your advocacy of “continuous takeoff” ;)
I don’t think “soft/slow takeoff” has a canonical meaning—some people (e.g. Paul) interpret it as not having discontinuities, while others interpret it as capabilities increasing slowly past human intelligence over (say) centuries (e.g. Superintelligence). If I say “slow takeoff” I don’t know which one the listener is going to hear it as. (And if I had to guess, I’d expect they think about the centuries-long version, which is usually not the one I mean.)
In contrast, I think “AI risk” has a much more canonical meaning, in that if I say “AI risk” I expect most listeners to interpret it as accidental risk caused by the AI system optimizing for goals that are not our own.
(Perhaps an important point is that I’m trying to communicate to a much wider audience than the people who read all the Alignment Forum posts and comments. I’d feel more okay about “slow takeoff” if I was just speaking to people who have read many of the posts already arguing about takeoff speeds.)
AI risk is just a shorthand for “accidental technical AI risk.” To the extent that people are confused, I agree it’s probably worth clarifying the type of risk by adding “accidental” and “technical” whenever we can.
However, I disagree with the idea that we should expand the word AI risk to include philosophical failures and intentional risks. If you open the term up, these outcomes might start to happen:
It becomes unclear in conversation what people mean when they say AI risk
Like The Singularity, it becomes a buzzword.
Journalists start projecting Terminator scenarios onto the words, and now have justification because even the researchers say that AI risk can mean a lot of different things.
It puts a whole bunch of types of risk into one basket, suggesting to outsiders that all attempts to reduce “AI risk” might be equally worthwhile.
ML researchers start to distrust AI risk researchers, because people who are worried about the Terminator are using the same words as the AI risk researchers and therefore get associated with them.
This can all be avoided by having a community norm to clarify that we mean technical accidental risk when we say AI risk, and when we’re talking about other types of risks we use more precise terminology.
I don’t think “AI risk” was originally meant to be a shorthand for “accidental technical AI risk”. The earliest considered (i.e., not off-hand) usage I can find is in the title of Luke Muehlhauser’s AI Risk and Opportunity: A Strategic Analysis where he defined it as “the risk of AI-caused extinction”.
(He used “extinction” but nowadays we tend think in terms of “existential risk” which also includes “permanent large negative consequences”, which seems like an reasonable expansion of “AI risk”.)
I want to include philosophical failures, as long as the consequences of the failures flow through AI, because (aside from historical usage) technical problems and philosophical problems blend into each other, and I don’t see a point in drawing an arbitrary and potentially contentious border between them. (Is UDT a technical advance or a philosophical advance? Is defining the right utility function for a Sovereign Singleton a technical problem or a philosophical problem? Why force ourselves to answer these questions?)
As for “intentional risks” it’s already common practice to include that in “AI risk”:
Besides that, I think there’s also a large grey area between “accident risk” and “misuse” where the risk partly comes from technical/philosophical problems and partly from human nature. For example humans might be easily persuaded by wrong but psychologically convincing moral/philosophical arguments that AIs can come up with and then order their AIs to do terrible things. Even pure intentional risks might have technical solutions. Again I don’t really see the point of trying to figure out which of these problems should be excluded from “AI risk”.
It seems perfectly fine to me to use that as shorthand for “AI-caused x-risk” and use more specific terms when we mean more specific risks.
What do you mean? Like people will use “AI risk” when their project has nothing to do with “AI-caused x-risk”? Couldn’t they do that even if we define “AI risk” to be “accidental technical AI risk”?
Terminator scenarios seem to be scenarios of “accidental technical AI risk” (they’re just not very realistic scenarios) so I don’t see how defining “AI risk” to mean that would prevent journalists from using Terminator scenarios to illustrate “AI risk”.
I don’t think this is a good argument, because even within “accidental technical AI risk” there are different problems that aren’t equally worthwhile to solve, so why aren’t you already worried about outsiders thinking all those problems are equally worthwhile?
See my response above regarding “Terminator scenarios”.
I propose that we instead stick with historical precedent and keep “AI risk” to mean “AI-caused x-risk” and use more precise terminology to refer to more specific types of AI-caused x-risk that we might want to talk about. Aside from what I wrote above, it’s just more intuitive/commonsensical that “AI risk” means “AI-caused x-risk” in general instead of a specific kind of AI-caused x-risk.
However I appreciate that someone who works mostly on the less philosophical / less human-related problems might find it tiresome to say or type “technical accidental AI risk” all the time to describe what they do or to discuss the importance of their work, and can find it very tempting to just use “AI risk”. It would probably be good to create a (different) shorthand or acronym for it to remove this temptation and to make their lives easier.
I appreciate the arguments, and I think you’ve mostly convinced me, mostly because of the historical argument.
I do still have some remaining apprehension about using AI risk to describe every type of risk arising from AI.
That is true. The way I see it, UDT is definitely on the technical side, even though it incorporates a large amount of philosophical background. When I say technical, I mostly mean “specific, uses math, has clear meaning within the language of computer science” rather than a more narrow meaning of “is related to machine learning” or something similar.
My issue with arguing for philosophical failure is that, as I’m sure you’re aware, there’s a well known failure mode of worrying about vague philosophical problems rather than more concrete ones. Within academic philosophy, the majority of discussion surrounding AI is centered around consciousness, intentionality, whether it’s possible to even construct a human-like machine, whether they should have rights etc.
There’s a unique thread of philosophy that arose from Lesswrong, which includes work on decision theory, that doesn’t focus on these thorny and low priority questions. While I’m comfortable with you arguing that philosophical failure is important, my impression is that the overly philosophical approach used by many people has done more harm than good for the field in the past, and continues to do so.
It is therefore sometimes nice to tell people that the problems that people work on here are concrete and specific, and don’t require doing a ton of abstract philosophy or political advocacy.
This is true, but my impression is that when you tell people that a problem is “technical” it generally makes them refrain from having a strong opinion before understanding a lot about it. “Accidental” also reframes the discussion by reducing the risk of polarizing biases. This is a common theme in many fields:
Physicists sometimes get frustrated with people arguing about “the philosophy of the interpretation of quantum mechanics” because there’s a large subset of people who think that since it’s philosophical, then you don’t need to have any subject-level expertise to talk about it.
Economists try to emphasize that they use models and empirical data, because a lot of people think that their field of study is more-or-less just high status opinion + math. Emphasizing that there are real, specific models that they study helps to reduce this impression. Same with political science.
A large fraction of tech workers are frustrated about the use of Machine Learning as a buzzword right now, and part of it is that people started saying Machine Learning = AI rather than Machine Learning = Statistics, and so a lot of people thought that even if they don’t understand statistics, they can understand AI since that’s like philosophy and stuff.
Scott Aaronson has said
My guess is that this shift in his thinking occurred because a lot of people started talking about technical risks from AI, rather than framing it as a philosophy problem, or a problem of eliminating bad actors. Eliezer has shared this viewpoint for years, writing in the CEV document,
reflecting the temptation to derail discussions about technical accidental risks.
Also, isn’t defining “AI risk” as “technical accidental AI risk” analogous to defining “apple” as “red apple” (in terms of being circular/illogical)? I realize natural language doesn’t have to be perfectly logical, but this still seems a bit too egregious.
I agree that this is troubling, though I think it’s similar to how I wouldn’t want the term biorisk to be expanded to include biodiversity loss (a risk, but not the right type), regular human terrorism (humans are biological, but it’s a totally different issue), zombie uprisings (they are biological, but it’s totally ridiculous), alien invasions etc.
Not to say that’s what you are doing with AI risk. I’m worried about what others will do with it if the term gets expanded.
Well as I said, natural language doesn’t have to be perfectly logical, and I think “biorisk” is in somewhat in that category but there’s an explanation that makes it a bit reasonable than it might first appear, which is that the “bio” refers not to “biological” but to “bioweapon”. This is actually one of the definitions that Google gives when you search for “bio”: “relating to or involving the use of toxic biological or biochemical substances as weapons of war. ‘bioterrorism’”
I guess the analogous thing would be if we start using “AI” to mean “technical AI accidents” in a bunch of phrases, which feels worse to me than the “bio” case, maybe because “AI” is a standalone word/acronym instead of a prefix? Does this make sense to you?
But the term was expanded from the beginning. Have you actually observed it being used in ways that you fear (and which would be prevented if we were to redefine it more narrowly)?
Yeah that makes sense. Your points about “bio” not being short for “biological” were valid, but the fact that as a listener I didn’t know that fact implies that it seems really easy to mess up the language usage here. I’m starting to think that the real fight should be about using terms that aren’t self explanatory.
I’m not sure about whether it would have been prevented by using the term more narrowly, but in my experience the most common reaction people outside of EA/LW (and even sometimes within) have to hearing about AI risk is to assume that it’s not technical, and to assume that it’s not about accidents. In that sense, I have seen been exposed to quite a bit of this already.
Tangential, but I wouldn’t be surprised if researchers were fairly quickly aware of the issue (e.g. within two years of the original GAN paper), but it took a while to become common knowledge because it isn’t particularly flashy. (There’s a surprising-to-me amount of know-how that is stored in researcher’s brains and never put down on paper.)
I mean, the solution is to use a VAE. If you care about covering modes but not image quality, you choose a VAE; if you care about image quality but not covering modes, you choose a GAN.
(Also, while I know very little about VAEs / GANs, Implicit Maximum Likelihood Estimation sounded like a principled fix to me.)
Agreed, I would guess that the researchers / engineers knew this was risky and thought it was worth it anyway. Or perhaps the managers did. But I do agree this is evidence against my position.
Why isn’t the threat clear once the problems are discovered?
Part of my claim is that we probably will get that (assuming AI really is risky), though perhaps not Chernobyl-level disaster, but still something with real negative consequences that “could be worse”.
I think I should be more specific, when you say:
I mean that no one sane who knows that will run that AI system with > X amount of computing power. When I wrote that comment I also thought that no one sane would not blow the whistle in that event. See my note at the end of the comment.*
However, when presented with that evidence, I don’t expect the AI community to react appropriately. The correct response to that evidence is to stop what your doing, and revisit the entire process and culture that led to the creation of an algorithm that will kill us all if run with >X amount of compute. What I expect will happen is that the AI community will try and solve the problem the same way it’s solved every other problem it has encountered. It will try an inordinate amount of unprincipled hacks to get around the issue.
Conditional on no FOOM, I can definitely see plenty of events with real negative consequences that “could be worse”. However, I claim that anything short of a Chernobyl level event won’t shock the community and the world into changing it’s culture or trying to coordinate. I also claim that the capabilities gap between a Chernobyl level event and a global catastrophic event is small, such that even in a non-FOOM scenario the former might not happen before the latter. Together, I think that there is a high probability that we will not get a disaster that is scary enough to get the AI community to change it’s culture and coordinate before it’s too late.
*Now that I think about it more though, I’m less sure. Undergraduate engineers get entire lectures dedicated to how and when to blow the whistle when faced with unethical corporate practices and dangerous projects or designs. When working, they also have insurance and some degree of legal protection from vengeful employers. Even then, you still see cover ups of shortcomings that lead to major industrial disasters. For instance, long before the disaster, someone had determined that the fukushima plant was indeed vulnerable to large tsunami impacts. The pattern where someone knows that something will go wrong but nothing is done to prevent it for one reason or another is not that uncommon in engineering disasters. Regardless of whether this is due to hindsight bias or an inadequate process for addressing safety issues, these disasters still happen regularly in fields with far more conservative, cautious, and safety oriented cultures.
I find it unlikely that the field of AI will change it’s culture from one of moving fast and hacking to something even more conservative and cautious than the cultures of consumer aerospace and nuclear engineering.
Idk, I don’t know what to say here. I meet lots of AI researchers, and the best ones seem to me to be quite thoughtful. I can say what would change my mind:
I take the exploration of unprincipled hacks as very weak evidence against my position, if it’s just in an academic paper. My guess is the researchers themselves would not advocate deploying their solution, or would say that it’s worth deploying but it’s an incremental improvement that doesn’t solve the full problem. And even if the researchers don’t say that, I suspect the companies actually deploying the systems would worry about it.
I would take the deployment of unprincipled hacks more seriously as evidence, but even there I would want to be convinced that shutting down the AI system was a better decision than deploying an unprincipled hack. (Because then I would have made the same decision in their shoes.)
Unprincipled hacks are in fact quite useful for the vast majority of problems; as a result it seems wrong to attribute irrationality to people because they use unprincipled hacks.