Conversely, if we had a complete technical solution, I don’t see why we necessarily need that much governance competence.
As I said in the article, technically controllable ASIs are the equivalent of an invasive species which will displace humans from Earth politically, economically and militarily.
And I’m saying that, assuming all the technical problems are solved, AI researchers would be the ones in control, and I (mostly) trust them to just not do things like build an AI that acts like an invasive species, or argues for its own rights, or build something that actually deserves such rights.
Maybe some random sociologists on Twitter will call for giving AIs rights, but in the counterfactual world where AI researchers have fine control of their own creations, I expect no one in a position to make decisions on the matter to give such calls any weight.
Even in the world we actually live in, I expect such calls to have little consequence. I do think some of the things you describe are reasonably likely to happen, but the people responsible for making them happen will do so unintentionally, with opinion columnists, government regulations, etc. playing little or no role in the causal process.
A bit of anecdotal impressions, yes, but mainly I just think that in humans being smart, conscientious, reflective, etc. enough to be the brightest researcher a big AI lab is actually pretty correlated with being Good (and also, that once you actually solve the technical problems, it doesn’t take that much Goodness to do the right thing for the collective and not just yourself).
Or, another way of looking at it, I find Scott Aaronson’s perspective convincing, when it is applied to humans. I just don’t think it will apply at all to the first kinds of AIs that people are actually likely to build, for technical reasons.
I think there are way more transhumanists and post-humanists at AGI labs than you imagine. Richard Sutton is a famous example (btw, I’ve just discovered that he moved from DeepMind to Keen Technologies, John Carmack’s venture), but I believe there are many more of them, but they disguise themselves for political reasons.
No. You have simplistic and incorrect beliefs about control.
If there are a bunch of companies (Deepmind, Anthropic, Meta, OpenAI, …) and a bunch of regulation efforts and politicians who all get inputs, then the AI researchers will have very little control authority, as little perhaps as the physicists had over the use of the H-bomb.
Where does the control really reside in this system?
Who made the decision to almost launch a nuclear torpedo in the Cuban Missile Crisis?
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do. They were all aligned around trying to build an object that would create the most powerful explosion possible.
As for who had control over the launch button, of course the physicists didn’t have that, and never expected to. But they also weren’t forced to work on the bomb; they did so voluntarily and knowing they wouldn’t be the ones who got any say in whether and how it would be used.
Another difference between an atomic bomb and AI is that the bomb itself had no say in how it was used. Once a superintelligence is turned on, control of the system rests entirely with the superintelligence and not with any humans. I strongly expect that researchers at big labs will not be forced to program an ASI to do bad things against the researchers’ own will, and I trust them not to do so voluntarily. (Again, all in the probably-counterfactual world where they know and understand all the consequences of their own actions.)
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do.
In that they wanted the bomb to explode? I think the analogous level of control for AI would be unsatisfactory.
they did so voluntarily and knowing they wouldn’t be the ones who got any say in whether and how it would be used.
I’m not sure they thought this; I think many expected that by playing along they would have influence later. Tech workers today often seem to care a lot about how products made by their companies are deployed.
In that they wanted the bomb to explode? I think the analogous level of control for AI would be unsatisfactory.
The premise of this hypothetical is that all the technical problems are solved—if an AI lab wants to build an AI to pursue the collective CEV of humanity or whatever, they can just get it to do that. Maybe they’ll settle on something other than CEV that is a bit better or worse or just different, but my point was that I don’t expect them to choose something ridiculous like “our CEO becomes god-emperor forever” or whatever.
I’m not sure they thought this; I think many expected that by playing along they would have influence later. Tech workers today often seem to care a lot about how products made by their companies are deployed.
Yeah, I was probably glossing over the actual history a bit too much; most of my knowledge on this comes from seeing Oppenheimer recently. The actual dis-analogy is that no AI researcher would really be arguing for not building and deploying ASI in this scenario, vs. with the atomic bomb where lots of people wanted to build it to have around, but not actually use it or only use it as some kind of absolute last resort. I don’t think many AI researchers in our actual reality have that kind of view on ASI, and probably few to none would have that view in the counterfactual where the technical problems are solved.
researchers at big labs will not be forced to program an ASI to do bad things against the researchers’ own will
Well these systems aren’t programmed. Researchers work on architecture and engineering, goal content is down to the RLHF that is applied and the wishes of the user(s), and the wishes of the user(s) are determined by market forces, user preferences, etc. And user preferences may themselves be influenced by other AI systems.
Closed source models can have RLHF and be delivered via an API, but open source models will not be far behind at any given point in time. And of course prompt injection attacks can bypass the RLHF on even closed source models.
The decisions about what RLHF to apply on contentious topics will come from politicians and from the leadership of the companies, not from the researchers. And politicians are influenced by the media and elections, and company leadership is influenced by the market and by cultural trends.
Where does the chain of control ultimately ground itself?
Answer: it doesn’t. Control of AI in the current paradigm is floating. Various players can influence it, but there’s no single source of truth for “what’s the AI’s goal”.
I don’t dispute any of that, but I also don’t think RLHF is a workable method for building or aligning a powerful AGI.
Zooming out, my original point was that there are two problems humanity is facing, quite different in character but both very difficult:
a coordination / governance problem, around deciding when to build AGI and who gets to build it
a technical problem, around figuring out how to build an AGI that does what the builder wants at all.
My view is that we are currently on track to solve neither of those problems. But if you actually consider what the world in which we sufficiently-completely solve even of them looks like, it seems like either is sufficient for a relatively high probability of a relatively good outcome, compared to where we are now.
Both possible worlds are probably weird hypotheticals which shouldn’t have an impact on what our actual strategy in the world we actually live in should be, which is of course to pursue solutions to both problems simultaneously with as much vigor as possible. But it still seems worth keeping in mind that if even one thing works out sufficiently well, we probably won’t be totally doomed.
I think the theory is something like the following: We build the guaranteed trustworthy AI, and ask it to prevent the creation of unaligned AI, and it comes up with the necessary governance structures, and the persuasion and force needed to implement them.
I’m not sure this is a certain argument. Some political actions are simply impossible to accomplish ethically, and therefore unavailable to a “good” actor even given superhuman abilities.
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do. They were all aligned around trying to build an object that would create the most powerful explosion possible.
Where did you learn of this?
From what I know it was the opposite, there were so many disagreements, even just among the physicists, that they decided to duplicate nearly all effort to produce two different types of nuclear device designs, the gun type and the implosion type, simultaneously.
e.g. both plutonium and uranium processing supply chains were set up at massive expense, and later environmental damage, just in case one design didn’t work.
Without commenting on whether there was in fact much agreement or disagreement among the physicists, this doesn’t sound like much evidence of disagreement. I think it’s often entirely reasonable to try two technical approaches simultaneously, even if everyone agrees that one of them is more promising.
Ah, I see. Yeah, that’s a reasonable worry. Any ideas on how someone in those orgs could incentivize such behavior whilst discouraging poorly thought out pivotal acts? I would be OK with a future where e.g. OAI gets 90-99% of the cosmic endowment as long as the rest of us get a chunk, or get the chance to safely grow to the point where we have a shot at the vast scraps OAI leaves behind.
Ah, I see. Yeah, that’s a reasonable worry. Any ideas on how someone in those orgs could incentivize such behavior whilst discouraging poorly thought out pivotal acts?
the fact that we are having this conversation simply underscores how dangerous this is and how unprepared we are.
This is the future of the universe we’re talking about. It shouldn’t be a footnote!
As I said in the article, technically controllable ASIs are the equivalent of an invasive species which will displace humans from Earth politically, economically and militarily.
And I’m saying that, assuming all the technical problems are solved, AI researchers would be the ones in control, and I (mostly) trust them to just not do things like build an AI that acts like an invasive species, or argues for its own rights, or build something that actually deserves such rights.
Maybe some random sociologists on Twitter will call for giving AIs rights, but in the counterfactual world where AI researchers have fine control of their own creations, I expect no one in a position to make decisions on the matter to give such calls any weight.
Even in the world we actually live in, I expect such calls to have little consequence. I do think some of the things you describe are reasonably likely to happen, but the people responsible for making them happen will do so unintentionally, with opinion columnists, government regulations, etc. playing little or no role in the causal process.
What is the basis of this trust? Anecdotal impressions of a few that you know personally in the space, opinion polling data, something else?
A bit of anecdotal impressions, yes, but mainly I just think that in humans being smart, conscientious, reflective, etc. enough to be the brightest researcher a big AI lab is actually pretty correlated with being Good (and also, that once you actually solve the technical problems, it doesn’t take that much Goodness to do the right thing for the collective and not just yourself).
Or, another way of looking at it, I find Scott Aaronson’s perspective convincing, when it is applied to humans. I just don’t think it will apply at all to the first kinds of AIs that people are actually likely to build, for technical reasons.
I think there are way more transhumanists and post-humanists at AGI labs than you imagine. Richard Sutton is a famous example (btw, I’ve just discovered that he moved from DeepMind to Keen Technologies, John Carmack’s venture), but I believe there are many more of them, but they disguise themselves for political reasons.
No. You have simplistic and incorrect beliefs about control.
If there are a bunch of companies (Deepmind, Anthropic, Meta, OpenAI, …) and a bunch of regulation efforts and politicians who all get inputs, then the AI researchers will have very little control authority, as little perhaps as the physicists had over the use of the H-bomb.
Where does the control really reside in this system?
Who made the decision to almost launch a nuclear torpedo in the Cuban Missile Crisis?
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do. They were all aligned around trying to build an object that would create the most powerful explosion possible.
As for who had control over the launch button, of course the physicists didn’t have that, and never expected to. But they also weren’t forced to work on the bomb; they did so voluntarily and knowing they wouldn’t be the ones who got any say in whether and how it would be used.
Another difference between an atomic bomb and AI is that the bomb itself had no say in how it was used. Once a superintelligence is turned on, control of the system rests entirely with the superintelligence and not with any humans. I strongly expect that researchers at big labs will not be forced to program an ASI to do bad things against the researchers’ own will, and I trust them not to do so voluntarily. (Again, all in the probably-counterfactual world where they know and understand all the consequences of their own actions.)
In that they wanted the bomb to explode? I think the analogous level of control for AI would be unsatisfactory.
I’m not sure they thought this; I think many expected that by playing along they would have influence later. Tech workers today often seem to care a lot about how products made by their companies are deployed.
The premise of this hypothetical is that all the technical problems are solved—if an AI lab wants to build an AI to pursue the collective CEV of humanity or whatever, they can just get it to do that. Maybe they’ll settle on something other than CEV that is a bit better or worse or just different, but my point was that I don’t expect them to choose something ridiculous like “our CEO becomes god-emperor forever” or whatever.
Yeah, I was probably glossing over the actual history a bit too much; most of my knowledge on this comes from seeing Oppenheimer recently. The actual dis-analogy is that no AI researcher would really be arguing for not building and deploying ASI in this scenario, vs. with the atomic bomb where lots of people wanted to build it to have around, but not actually use it or only use it as some kind of absolute last resort. I don’t think many AI researchers in our actual reality have that kind of view on ASI, and probably few to none would have that view in the counterfactual where the technical problems are solved.
Well these systems aren’t programmed. Researchers work on architecture and engineering, goal content is down to the RLHF that is applied and the wishes of the user(s), and the wishes of the user(s) are determined by market forces, user preferences, etc. And user preferences may themselves be influenced by other AI systems.
Closed source models can have RLHF and be delivered via an API, but open source models will not be far behind at any given point in time. And of course prompt injection attacks can bypass the RLHF on even closed source models.
The decisions about what RLHF to apply on contentious topics will come from politicians and from the leadership of the companies, not from the researchers. And politicians are influenced by the media and elections, and company leadership is influenced by the market and by cultural trends.
Where does the chain of control ultimately ground itself?
Answer: it doesn’t. Control of AI in the current paradigm is floating. Various players can influence it, but there’s no single source of truth for “what’s the AI’s goal”.
I don’t dispute any of that, but I also don’t think RLHF is a workable method for building or aligning a powerful AGI.
Zooming out, my original point was that there are two problems humanity is facing, quite different in character but both very difficult:
a coordination / governance problem, around deciding when to build AGI and who gets to build it
a technical problem, around figuring out how to build an AGI that does what the builder wants at all.
My view is that we are currently on track to solve neither of those problems. But if you actually consider what the world in which we sufficiently-completely solve even of them looks like, it seems like either is sufficient for a relatively high probability of a relatively good outcome, compared to where we are now.
Both possible worlds are probably weird hypotheticals which shouldn’t have an impact on what our actual strategy in the world we actually live in should be, which is of course to pursue solutions to both problems simultaneously with as much vigor as possible. But it still seems worth keeping in mind that if even one thing works out sufficiently well, we probably won’t be totally doomed.
How does a solution to the above solve the coordination/governance problem?
I think the theory is something like the following: We build the guaranteed trustworthy AI, and ask it to prevent the creation of unaligned AI, and it comes up with the necessary governance structures, and the persuasion and force needed to implement them.
I’m not sure this is a certain argument. Some political actions are simply impossible to accomplish ethically, and therefore unavailable to a “good” actor even given superhuman abilities.
Where did you learn of this?
From what I know it was the opposite, there were so many disagreements, even just among the physicists, that they decided to duplicate nearly all effort to produce two different types of nuclear device designs, the gun type and the implosion type, simultaneously.
e.g. both plutonium and uranium processing supply chains were set up at massive expense, and later environmental damage, just in case one design didn’t work.
Without commenting on whether there was in fact much agreement or disagreement among the physicists, this doesn’t sound like much evidence of disagreement. I think it’s often entirely reasonable to try two technical approaches simultaneously, even if everyone agrees that one of them is more promising.
You do realize setting up each supply chain alone took up well over 1% of total US GDP right?
I didn’t know that, but not a crux. This information does not make me think it was obviously unreasonable to try both approaches simultaneously.
(Downvoted for tone.)
How does this relate to the discussion Max H and Roko were having? Or the question I asked of Max H?
I don’t know, I didn’t intend it to relate to those things. It was a narrow reply to something in your comment, and I attempted to signal it as such.
(I’m not very invested in this conversation and currently intend to reply at most twice more.)
Okay then.
So you don’t think a pivotal act exists? Or, more amitiously, you don’t think a sovereign implementing CEV would result in a good enough world?
Who is going to implement CEV or some other pivotal act?
Ah, I see. Yeah, that’s a reasonable worry. Any ideas on how someone in those orgs could incentivize such behavior whilst discouraging poorly thought out pivotal acts? I would be OK with a future where e.g. OAI gets 90-99% of the cosmic endowment as long as the rest of us get a chunk, or get the chance to safely grow to the point where we have a shot at the vast scraps OAI leaves behind.
the fact that we are having this conversation simply underscores how dangerous this is and how unprepared we are.
This is the future of the universe we’re talking about. It shouldn’t be a footnote!