It seems to me that a major crux about AI strategy routes through “is civilization generally adequate or not?”. It seems like people have pretty different intuitions and ontologies here. Here’s an attempt at some questions of varying levels of concreteness, to tease out some worldview implications.
(I normally use the phrase “civilizational adequacy”, but I think that’s kinda a technical term that means a specific thing and I think maybe I’m pointing at a broader concept.)
“Does civilization generally behave sensibly?” This is a vague question, some possible subquestions:
Do you think major AI orgs will realize that AI is potentially worldendingly dangerous, and have any kind of process at all to handle that? [edit: followup: how sufficient are those processes?]
Do you think government intervention on AI regulations or policies will be net-positive or net-negative, for purposes of preventing x-risk?
How quickly do you think the AI ecosystem will update on new “promising” advances (either in the realm of capabilities or the realm of safety)
How many intelligent, sensible people do there seem to be in the world who are thinking about AGI? (order of magnitude. like is there 1, 10, 1000, 100,000?)
What’s your vague emotional valence towards “civilization generally”, “the AI ecosystem in particular”, or “the parts of government that engage with AGI”.
[edit: Does any of this feel like it’s cruxy for your views on AI?]
“Why do you believe what you believe about civilizational adequacy?”
What new facts would change your mind about any of the above questions?
What were the causal nodes in your history that led you to form your opinions about the above?
If you imagine turning out to be wrong/confused in some deep way about any of the above, do you have a sense of why that would turn out to be?
“If all of these questions feel ill-formed, can you substitute some questions in your own ontology and answer them instead?”
I don’t think this is the main crux—disagreements about mechanisms of intelligence seem far more important—but to answer the questions:
Do you think major AI orgs will realize that AI is potentially worldendingly dangerous, and have any kind of process at all to handle that?
Clearly yes? They have safety teams that are focused on x-risk? I suspect I have misunderstood your question.
(Maybe you mean the bigger tech companies like FAANG, in which case I’m still at > 95% on yes, but I suspect I am still misunderstanding your question.)
(I know less about Chinese orgs but I still think “probably yes” if they become major AGI orgs.)
Do you think government intervention on AI regulations or policies will be net-positive or net-negative, for purposes of preventing x-risk?
Net positive, though mostly because it seems kinda hard to be net negative relative to “no regulation at all”, not because I think the regulations will be well thought out. The main tradeoff that companies face seems to be speed / capabilities vs safety; it seems unlikely that even “random” regulations increase the speed and capabilities that companies can achieve. (Though it’s certainly possible, e.g. a regulation for openness of research / reducing companies’ ability to keep trade secrets.)
Note I am not including “the military races to get AGI” since that doesn’t seem within scope, but if we include that, I think I’m at net negative but my view here is really unstable.
How quickly do you think the AI ecosystem will update on new “promising” advances (either in the realm of capabilities or the realm of safety)
Not sure how to answer this. Intellectual thought leaders will have their own “grand theory” hobbyhorses, from which they are unlikely to update very much. (This includes the participants in these dialogues.) But the communities as a whole will switch between “grand theories”; currently the timescale is 2-10 years. That timescale will shorten over time, but the rate at which such switches happen will lengthen. Later on, once we start getting into takeoff (e.g. GDP growth rate doubles), both the timescale and the rates shorten (but also most of my predictions are now very low probability because the world has changed a bunch in ways I didn’t foresee).
For more simple advances like “new auxiliary loss that leads to more efficient learning” or “a particular failure mode and how to avoid it by retraining your human raters”, the speed at which they are incorporated depends on complexity of implementation, amount of publicity, economic value, etc, but typical numbers are 1 week − 1 year.
How many intelligent, sensible people do there seem to be in the world who are thinking about AGI? (order of magnitude. like is there 1, 10, 1000, 100,000?)
“Sensible” seems incredibly subjective so that predictions about it do not actually lead to communication of information between people, so I’m going to ignore that. In that case, I’d say currently 300 FTE-equivalents in or adjacent to AI safety, and 1000 FTE-equivalents on AGI more generally.
What’s your vague emotional valence towards “civilization generally”, “the AI ecosystem in particular”, or “the parts of government that engage with AGI”.
Civilization generally seems pretty pathetic at doing good things relative to what “could” be accomplished. Incentives are frequently crazy. Good things frequently don’t happen because of a minority with veto powers. Decisionmakers frequently take knowably bad actions that would look good to the people judging them. I am pretty frustrated at it.
Also, the world still runs mostly smoothly, and most of the people involved seem to be doing reasonable things given the constraints on them (+ natural human inclinations like risk aversion).
Peer review is extremely annoying to navigate and one of the biggest benefits of DeepMind relative to academia is that I don’t have to deal with it nearly as much. Reviewers seem unable to grasp basic conceptual arguments if they’re at all different from the standard style of argument in papers. Less often but still way too frequently they display failures of basic reading comprehension. The specific people I meet in the AI ecosystem are better but still have frustrating amounts of Epistemic Learned Helplessness and other heuristics like “theory is useless without experiments” that effectively mean that they can’t ever make a hard update.
Also, more senior people are more competent and have less of these properties, people can and do change their minds over time (albeit over the course of years rather than hours), and the things AI people say often seem about as reasonable as the things that AI safety people say (controlling for seniority).
I’m scared of the zero-sum thinking that I’m told goes on in the military. Outside of the military, governments feel like hugely influential players whose actions are (currently) very unpredictable. It feels incredibly high-stakes, but also has the potential to be incredibly good.
What new facts would change your mind about any of the above questions?
Though it isn’t a “fact”, one route is to show me an abstract theory that does well at predicting civilizational responses, especially the parts that are not high profile (e.g. I want a theory that doesn’t just explain COVID response—it also explains seatbelt regulations, how property taxes are set, and why knives aren’t significantly regulated, etc).
What were the causal nodes in your history that led you to form your opinions about the above?
Those were a lot of pretty different opinions, each of which has lots of different causal nodes. I currently don’t see a big category that influenced all of them, so I think I’m going to punt on this question.
If you imagine turning out to be wrong/confused in some deep way about any of the above, do you have a sense of why that would turn out to be?
I only looked at things that can be observed online, rather than looking at all the other different ways that humans interact with the world; those other ways would have demonstrated obvious flaws in my answers.
Thanks. I wasn’t super satisfied with the way I phrased my questions. I just made some slight edits to them (labeled as such), although they still don’t feel like they quite do the thing. (I feel like I’m looking at a bunch of subtle frame disconnects, while multiple other frame disconnects are going on, so pinpointing the thing is hard_
I think “is any of this actually cruxy” is maybe the most important question and I should have included it. You answered “not supermuch, at least compared to models of intelligence”. Do you think there’s any similar nearby thing that feels more relevant on your end?
In any case, thanks for your answers, they do help give me more a sense of the gestalt of your worldview here, however relevant it is.
It’s definitely cruxy in the sense that changing my opinions on any of these would shift my p(doom) some amount.
My rough model is that there’s an unknown quantity about reality which is roughly “how strong does the oversight process have to be before the trained model does what the oversight process intended for it to do”. p(doom) mainly depends on whether the actors training the powerful systems have sufficiently powerful oversight processes. This seems primarily affected by the quality of technical alignment solutions, but certainly civilizational adequacy also affects the answer.
It seems to me that a major crux about AI strategy routes through “is civilization generally adequate or not?”. It seems like people have pretty different intuitions and ontologies here. Here’s an attempt at some questions of varying levels of concreteness, to tease out some worldview implications.
(I normally use the phrase “civilizational adequacy”, but I think that’s kinda a technical term that means a specific thing and I think maybe I’m pointing at a broader concept.)
“Does civilization generally behave sensibly?” This is a vague question, some possible subquestions:
Do you think major AI orgs will realize that AI is potentially worldendingly dangerous, and have any kind of process at all to handle that? [edit: followup: how sufficient are those processes?]
Do you think government intervention on AI regulations or policies will be net-positive or net-negative, for purposes of preventing x-risk?
How quickly do you think the AI ecosystem will update on new “promising” advances (either in the realm of capabilities or the realm of safety)
How many intelligent, sensible people do there seem to be in the world who are thinking about AGI? (order of magnitude. like is there 1, 10, 1000, 100,000?)
What’s your vague emotional valence towards “civilization generally”, “the AI ecosystem in particular”, or “the parts of government that engage with AGI”.
[edit: Does any of this feel like it’s cruxy for your views on AI?]
“Why do you believe what you believe about civilizational adequacy?”
What new facts would change your mind about any of the above questions?
What were the causal nodes in your history that led you to form your opinions about the above?
If you imagine turning out to be wrong/confused in some deep way about any of the above, do you have a sense of why that would turn out to be?
“If all of these questions feel ill-formed, can you substitute some questions in your own ontology and answer them instead?”
I don’t think this is the main crux—disagreements about mechanisms of intelligence seem far more important—but to answer the questions:
Clearly yes? They have safety teams that are focused on x-risk? I suspect I have misunderstood your question.
(Maybe you mean the bigger tech companies like FAANG, in which case I’m still at > 95% on yes, but I suspect I am still misunderstanding your question.)
(I know less about Chinese orgs but I still think “probably yes” if they become major AGI orgs.)
Net positive, though mostly because it seems kinda hard to be net negative relative to “no regulation at all”, not because I think the regulations will be well thought out. The main tradeoff that companies face seems to be speed / capabilities vs safety; it seems unlikely that even “random” regulations increase the speed and capabilities that companies can achieve. (Though it’s certainly possible, e.g. a regulation for openness of research / reducing companies’ ability to keep trade secrets.)
Note I am not including “the military races to get AGI” since that doesn’t seem within scope, but if we include that, I think I’m at net negative but my view here is really unstable.
Not sure how to answer this. Intellectual thought leaders will have their own “grand theory” hobbyhorses, from which they are unlikely to update very much. (This includes the participants in these dialogues.) But the communities as a whole will switch between “grand theories”; currently the timescale is 2-10 years. That timescale will shorten over time, but the rate at which such switches happen will lengthen. Later on, once we start getting into takeoff (e.g. GDP growth rate doubles), both the timescale and the rates shorten (but also most of my predictions are now very low probability because the world has changed a bunch in ways I didn’t foresee).
For more simple advances like “new auxiliary loss that leads to more efficient learning” or “a particular failure mode and how to avoid it by retraining your human raters”, the speed at which they are incorporated depends on complexity of implementation, amount of publicity, economic value, etc, but typical numbers are 1 week − 1 year.
“Sensible” seems incredibly subjective so that predictions about it do not actually lead to communication of information between people, so I’m going to ignore that. In that case, I’d say currently 300 FTE-equivalents in or adjacent to AI safety, and 1000 FTE-equivalents on AGI more generally.
Civilization generally seems pretty pathetic at doing good things relative to what “could” be accomplished. Incentives are frequently crazy. Good things frequently don’t happen because of a minority with veto powers. Decisionmakers frequently take knowably bad actions that would look good to the people judging them. I am pretty frustrated at it.
Also, the world still runs mostly smoothly, and most of the people involved seem to be doing reasonable things given the constraints on them (+ natural human inclinations like risk aversion).
Peer review is extremely annoying to navigate and one of the biggest benefits of DeepMind relative to academia is that I don’t have to deal with it nearly as much. Reviewers seem unable to grasp basic conceptual arguments if they’re at all different from the standard style of argument in papers. Less often but still way too frequently they display failures of basic reading comprehension. The specific people I meet in the AI ecosystem are better but still have frustrating amounts of Epistemic Learned Helplessness and other heuristics like “theory is useless without experiments” that effectively mean that they can’t ever make a hard update.
Also, more senior people are more competent and have less of these properties, people can and do change their minds over time (albeit over the course of years rather than hours), and the things AI people say often seem about as reasonable as the things that AI safety people say (controlling for seniority).
I’m scared of the zero-sum thinking that I’m told goes on in the military. Outside of the military, governments feel like hugely influential players whose actions are (currently) very unpredictable. It feels incredibly high-stakes, but also has the potential to be incredibly good.
Though it isn’t a “fact”, one route is to show me an abstract theory that does well at predicting civilizational responses, especially the parts that are not high profile (e.g. I want a theory that doesn’t just explain COVID response—it also explains seatbelt regulations, how property taxes are set, and why knives aren’t significantly regulated, etc).
Those were a lot of pretty different opinions, each of which has lots of different causal nodes. I currently don’t see a big category that influenced all of them, so I think I’m going to punt on this question.
I only looked at things that can be observed online, rather than looking at all the other different ways that humans interact with the world; those other ways would have demonstrated obvious flaws in my answers.
Thanks. I wasn’t super satisfied with the way I phrased my questions. I just made some slight edits to them (labeled as such), although they still don’t feel like they quite do the thing. (I feel like I’m looking at a bunch of subtle frame disconnects, while multiple other frame disconnects are going on, so pinpointing the thing is hard_
I think “is any of this actually cruxy” is maybe the most important question and I should have included it. You answered “not supermuch, at least compared to models of intelligence”. Do you think there’s any similar nearby thing that feels more relevant on your end?
In any case, thanks for your answers, they do help give me more a sense of the gestalt of your worldview here, however relevant it is.
It’s definitely cruxy in the sense that changing my opinions on any of these would shift my p(doom) some amount.
My rough model is that there’s an unknown quantity about reality which is roughly “how strong does the oversight process have to be before the trained model does what the oversight process intended for it to do”. p(doom) mainly depends on whether the actors training the powerful systems have sufficiently powerful oversight processes. This seems primarily affected by the quality of technical alignment solutions, but certainly civilizational adequacy also affects the answer.