FWIW, I think I represent the majority of safety researchers in saying that you shouldn’t be too concerned with your effect on capabilities; there’s many more people pushing capabilities, so most safety research is likely a drop in the capabilities bucket (although there may be important exceptions!)
Personally, I agree that improving social institutions seems more important for reducing AI-Xrisk ATM than technical work. Are you doing that? There are options for that kind of work as well, e.g. at FHI.
I had been thinking about metrics for measuring progress towards shared agreed outcomes as a method of co-ordination between potentially competitive powers to avoid arms races.
I passed around the draft to a couple of the usual suspects in the ai metrics/risk mitigation in hopes of getting collaborators. But no joy. I learnt that Jack Clark of OpenAI is looking at that kind of thing as well and is a lot better positioned to act on it, so I have hopes around that.
Moving on from that I’m thinking that we might need a broad base of support from people (depending upon the scenario) so being able to explain how people could still have meaningful lives post AI is important for building that support. So I’ve been thinking about that.
Moving on from that I’m thinking that we might need a broad base of support from people (depending upon the scenario) so being able to explain how people could still have meaningful lives post AI is important for building that support. So I’ve been thinking about that.
This sounds like it would be useful for getting people to support the development of AGI, rather than effective global regulation of AGI. What am I missing?
For longer time frames where there might be visible development, the public needs to trust that the political regulators of AI to have their interests at heart. Else they may try and make it a party political issue, which I think would be terrible for sane global regulation.
I’ve come across pretty strong emotion when talking about AGI even when talking about safety, which I suspect will come bubbling to the fore more as time goes by.
It may also help morale of the thoughtful people trying to make safe AI.
If you’re able to contribute equally to technical safety work and institution-oriented work, my own advice would generally be to prioritize technical work. I agree with capybarelet, though, that safety researchers should be willing to do work that might synergize with capabilities research, where the tradeoff looks worth it.
On the other hand, I think “don’t worry about how your research (or other actions) will impact AGI timelines or development trajectories, because whatever you’re doing is probably a drop in the bucket” is a bad meme to propagate. Some of the buckets that matter aren’t that large, and the drops may be much larger for some of the researchers who are particularly adept at making safety breakthroughs. (And public advice should plausibly be skewed toward those people, since most of the expected impact of advice may come from its influence on large-drop people.)
I think the best argument against institution-oriented work is that it might be harder to make a big impact. But more importantly, I think strong global coordination is necessary and sufficient, whereas technical safety is plausibly neither.
I also agree that one should consider tradeoffs, sometimes. But every time someone has raised this concern to me (I think it’s been 3x?) I think it’s been a clear cut case of “why are you even worrying about that”, which leads me to believe that there are a lot of people who are overconcerned about this.
I would have said that strong global coordination before we get to AGI isn’t necessary. I’d also have said that strong global coordination without an alignment solution is insufficient, given that it’s not realistic to shoot for levels of coordination like “let’s just never build AGI”. (My model of Nate would also add here that never building AGI would mean losing an incredible amount of cosmopolitan value, enough to count as an existential catastrophe in its own right.)
Maybe we could start with you saying why you think it’s necessary and sufficient? That might give me a better understanding of what you have in mind by “institution-oriented work”.
I also agree that one should consider tradeoffs, sometimes. But every time someone has raised this concern to me (I think it’s been 3x?) I think it’s been a clear cut case of “why are you even worrying about that”, which leads me to believe that there are a lot of people who are overconcerned about this.
I wouldn’t be at all surprised if lots of people are overconcerned about this. Many people are also underconcerned, though. I feel better about public advice that encourages people to test their models of the size of relevant drops and relevant buckets, rather than just trying to correct for a bias some people have in a particular direction (which makes overcorrection easy).
I feel better about public advice that encourages people to test their models of the size of relevant drops and relevant buckets, rather than just trying to correct for a bias some people have in a particular direction (which makes overcorrection easy).
Differential research that advances safety more than AI capability still advances AI capability.
Which seems to suggest that advancing AI capability is sufficient reason to avoid technical safety that has non-trivial overlap with capabilities. I think that’s wrong.
RE the necessary and sufficient argument:
1) Necessary: it’s unclear that a technical solution to alignment would be sufficient, since our current social institutions are not designed for superintelligent actors, and we might not develop effective new ones quickly enough
2) Sufficient: I agree that never building AGI is a potential Xrisk (or close enough). I don’t think it’s entirely unrealistic “to shoot for levels of coordination like ‘let’s just never build AGI’”, although I agree it’s a long shot. Supposing we have that level of coordination, we could use “never build AGI” as a backup plan while we work to solve technical safety to our satisfaction, if that is in fact possible.
Yeah, I agree with that; my above suggestion is taking into account that this is a likely case of overconcern.
1) Necessary: it’s unclear that a technical solution to alignment would be sufficient, since our current social institutions are not designed for superintelligent actors, and we might not develop effective new ones quickly enough
This sounds weaker to me than what I usually think of as a “necessary and sufficient” condition.
My view is more or less the one Eliezer points to here:
The big big problem is, “Nobody knows how to make the nice AI.” You ask people how to do it, they either don’t give you any answers or they give you answers that I can shoot down in 30 seconds as a result of having worked in this field for longer than five minutes.
It doesn’t matter how good their intentions are. It doesn’t matter if they don’t want to enact a Hollywood movie plot. They don’t know how to do it. Nobody knows how to do it. There’s no point in even talking about the arms race if the arms race is between a set of unfriendly AIs with no friendly AI in the mix.
And the one in the background when he says a competitive AGI project can’t deal with large slowdowns:
Because I don’t think you can get the latter degree of advantage over other AGI projects elsewhere in the world. Unless you are postulating massive global perfect surveillance schemes that don’t wreck humanity’s future, carried out by hyper-competent, hyper-trustworthy great powers with a deep commitment to cosmopolitan value — very unlike the observed characteristics of present great powers, and going unopposed by any other major government.
I would say that actually solving the technical problem clearly is necessary for good outcomes, whereas strong pre-AGI global coordination is helpful but not necessary. And the scenario where a leading AI company just builds sufficiently aligned AGI, runs it, and saves the world doesn’t strike me as particularly implausible, relative to other ‘things turn out alright’ outcomes; whereas the scenario where world leaders like Trump, Putin, and Xi Jinping usher in a permanent otherwise-utopian AGI-free world government does strike me as much crazier than the ten or hundred likeliest ‘things turn out alright’ scenarios.
In general, better coordination reduces the difficulty of the relevant technical challenges, and technical progress reduces the difficulty of the relevant coordination challenges; so both are worth pursuing. I do think that (e.g.) reducing x-risk by 5% with coordination work is likely to be much more difficult than reducing it by 5% with technical work, and I think the necessity and sufficiency arguments are much weaker for ‘just try to get everyone to be friends’ approaches than for ‘just try to figure out how to build this kind of machine’ approaches.
My view is more or less the one Eliezer points to here:
The big big problem is, “Nobody knows how to make the nice AI.” You ask people how to do it, they either don’t give you any answers or they give you answers that I can shoot down in 30 seconds as a result of having worked in this field for longer than five minutes.
There are probably no fire alarms for “nice AI designs” either, just like there are no fire alarms for AI in general.
Why should we expect people to share “nice AI designs”?
FWIW, I think I represent the majority of safety researchers in saying that you shouldn’t be too concerned with your effect on capabilities; there’s many more people pushing capabilities, so most safety research is likely a drop in the capabilities bucket (although there may be important exceptions!)
Personally, I agree that improving social institutions seems more important for reducing AI-Xrisk ATM than technical work. Are you doing that? There are options for that kind of work as well, e.g. at FHI.
I had been thinking about metrics for measuring progress towards shared agreed outcomes as a method of co-ordination between potentially competitive powers to avoid arms races.
I passed around the draft to a couple of the usual suspects in the ai metrics/risk mitigation in hopes of getting collaborators. But no joy. I learnt that Jack Clark of OpenAI is looking at that kind of thing as well and is a lot better positioned to act on it, so I have hopes around that.
Moving on from that I’m thinking that we might need a broad base of support from people (depending upon the scenario) so being able to explain how people could still have meaningful lives post AI is important for building that support. So I’ve been thinking about that.
This sounds like it would be useful for getting people to support the development of AGI, rather than effective global regulation of AGI. What am I missing?
For longer time frames where there might be visible development, the public needs to trust that the political regulators of AI to have their interests at heart. Else they may try and make it a party political issue, which I think would be terrible for sane global regulation.
I’ve come across pretty strong emotion when talking about AGI even when talking about safety, which I suspect will come bubbling to the fore more as time goes by.
It may also help morale of the thoughtful people trying to make safe AI.
If you’re able to contribute equally to technical safety work and institution-oriented work, my own advice would generally be to prioritize technical work. I agree with capybarelet, though, that safety researchers should be willing to do work that might synergize with capabilities research, where the tradeoff looks worth it.
On the other hand, I think “don’t worry about how your research (or other actions) will impact AGI timelines or development trajectories, because whatever you’re doing is probably a drop in the bucket” is a bad meme to propagate. Some of the buckets that matter aren’t that large, and the drops may be much larger for some of the researchers who are particularly adept at making safety breakthroughs. (And public advice should plausibly be skewed toward those people, since most of the expected impact of advice may come from its influence on large-drop people.)
Can you give some arguments for these views?
I think the best argument against institution-oriented work is that it might be harder to make a big impact. But more importantly, I think strong global coordination is necessary and sufficient, whereas technical safety is plausibly neither.
I also agree that one should consider tradeoffs, sometimes. But every time someone has raised this concern to me (I think it’s been 3x?) I think it’s been a clear cut case of “why are you even worrying about that”, which leads me to believe that there are a lot of people who are overconcerned about this.
I would have said that strong global coordination before we get to AGI isn’t necessary. I’d also have said that strong global coordination without an alignment solution is insufficient, given that it’s not realistic to shoot for levels of coordination like “let’s just never build AGI”. (My model of Nate would also add here that never building AGI would mean losing an incredible amount of cosmopolitan value, enough to count as an existential catastrophe in its own right.)
Maybe we could start with you saying why you think it’s necessary and sufficient? That might give me a better understanding of what you have in mind by “institution-oriented work”.
I wouldn’t be at all surprised if lots of people are overconcerned about this. Many people are also underconcerned, though. I feel better about public advice that encourages people to test their models of the size of relevant drops and relevant buckets, rather than just trying to correct for a bias some people have in a particular direction (which makes overcorrection easy).
I like this sentence a lot.
So my original response was to the statement:
Which seems to suggest that advancing AI capability is sufficient reason to avoid technical safety that has non-trivial overlap with capabilities. I think that’s wrong.
RE the necessary and sufficient argument:
1) Necessary: it’s unclear that a technical solution to alignment would be sufficient, since our current social institutions are not designed for superintelligent actors, and we might not develop effective new ones quickly enough
2) Sufficient: I agree that never building AGI is a potential Xrisk (or close enough). I don’t think it’s entirely unrealistic “to shoot for levels of coordination like ‘let’s just never build AGI’”, although I agree it’s a long shot. Supposing we have that level of coordination, we could use “never build AGI” as a backup plan while we work to solve technical safety to our satisfaction, if that is in fact possible.
Yeah, I agree with that; my above suggestion is taking into account that this is a likely case of overconcern.
This sounds weaker to me than what I usually think of as a “necessary and sufficient” condition.
My view is more or less the one Eliezer points to here:
And the one in the background when he says a competitive AGI project can’t deal with large slowdowns:
I would say that actually solving the technical problem clearly is necessary for good outcomes, whereas strong pre-AGI global coordination is helpful but not necessary. And the scenario where a leading AI company just builds sufficiently aligned AGI, runs it, and saves the world doesn’t strike me as particularly implausible, relative to other ‘things turn out alright’ outcomes; whereas the scenario where world leaders like Trump, Putin, and Xi Jinping usher in a permanent otherwise-utopian AGI-free world government does strike me as much crazier than the ten or hundred likeliest ‘things turn out alright’ scenarios.
In general, better coordination reduces the difficulty of the relevant technical challenges, and technical progress reduces the difficulty of the relevant coordination challenges; so both are worth pursuing. I do think that (e.g.) reducing x-risk by 5% with coordination work is likely to be much more difficult than reducing it by 5% with technical work, and I think the necessity and sufficiency arguments are much weaker for ‘just try to get everyone to be friends’ approaches than for ‘just try to figure out how to build this kind of machine’ approaches.
There are probably no fire alarms for “nice AI designs” either, just like there are no fire alarms for AI in general.
Why should we expect people to share “nice AI designs”?