In order for humans to survive the AI transition I think we need to succeed on the technical problems of alignment (which are perhaps not as bad as Less Wrong culture made them out to be), and we also need to “land the plane” of superintelligent AI on a stable equilibrium where humans are still the primary beneficiaries of civilization, rather than a pest species to be exterminated or squatters to be evicted.
Do we really need both? It seems like either a technical solution OR competent global governance would mostly suffice.
Actually-competent global governance should be able to coordinate around just not building AGI (and preventing anyone else from building it) indefinitely. If we could solve a coordination problem on that scale, we could also probably solve a bunch of other mundane coordination problems, governance issues, unrelated x-risks, etc., resulting in a massive boost to global prosperity and happiness through non-AI technological progress and good policy.
Conversely, if we had a complete technical solution, I don’t see why we necessarily need that much governance competence. Even if takeoff turns out to be relatively slow, the people initially building and controlling AGI will probably be mostly researchers in big labs.
Maybe ideally we would want a “long reflection” of some kind, but in the probably-counterfactual world where these researchers actually get exactly what they aim for, I mostly trust them to aim the AI at something like “fill the galaxy with humanity’s collective coherent extrapolated volition”, and that seems good enough in a pinch / hurry, if it actually works.
Without governance you’re stuck trusting that the lead researcher (or whoever is in control) turns down near infinite power and instead act selflessly. That seems like quite the gamble.
I don’t think it’s such a stark choice. I think odds are the lead researcher takes the infinite power, and it turns out okay to great. Corrigibility seems like the safest outer alignment plan, and it’s got to be corrigible to some set of people in particular. I think giving one random person near infinite power will work out way better than intuition suggests. I think it’s not power that corrupts, but rather the pursuit of power. I think unlimited power will lead to an ordinary, non-sociopathic person to progressively focus more on their empathy for others. I think they’ll ultimately use that power to let others do whatever they want that doesn’t take away others’ freedom to do what they want. And that’s the best outer alignment result, in my opinioin.
Alexander Wales in the end of his series ‘Worth the Candle’ does a lovely job of laying out what a genuinely kind person given omnipotence could do to make the world a nice place for everyone. It’s a lovely vision, but I think relying on this in practice seems a lot less trustworthy to me than having a bureaucratic process with checks & balances in charge. I mean, I still think it’ll ultimately have to be some relatively small team in charge of a model corrigible to them, if we’re in a singleton scenario. I have a lot more faith in ‘small team with bureaucratic oversight’ than some individual tech bro selected semi-randomly from the set of researchers at big AI labs who might be presented with the opportunity to ‘get the jump’ on everyone else.
I’m curious why you trust a small group of government bros a lot more than one tech bro. I wouldn’t strongly prefer either, but I’d prefer Sam Altman or Demis Hassabis to a randomly chosen bureaucrat. I don’t totally trust those guys, but I think it’s pretty likely they’re not total sociopaths or idiots.
By the opportunity to get the jump on everyone else, do you mean beating other companies to AGI, or becoming the one guy your AGI takes orders from?
I meant stealing control of an AGI within the company before the rest of the company catches on. I don’t necessarily mean that I’d not want Sam or Demis involved in the ruling council, just that I’d prefer if there was like… an assigned group of people to directly operate the model, and an oversight committee with reporting rules reporting to a larger public audience. Regulations and structure, rather than the whims of one person.
Conversely, if we had a complete technical solution, I don’t see why we necessarily need that much governance competence.
As I said in the article, technically controllable ASIs are the equivalent of an invasive species which will displace humans from Earth politically, economically and militarily.
And I’m saying that, assuming all the technical problems are solved, AI researchers would be the ones in control, and I (mostly) trust them to just not do things like build an AI that acts like an invasive species, or argues for its own rights, or build something that actually deserves such rights.
Maybe some random sociologists on Twitter will call for giving AIs rights, but in the counterfactual world where AI researchers have fine control of their own creations, I expect no one in a position to make decisions on the matter to give such calls any weight.
Even in the world we actually live in, I expect such calls to have little consequence. I do think some of the things you describe are reasonably likely to happen, but the people responsible for making them happen will do so unintentionally, with opinion columnists, government regulations, etc. playing little or no role in the causal process.
A bit of anecdotal impressions, yes, but mainly I just think that in humans being smart, conscientious, reflective, etc. enough to be the brightest researcher a big AI lab is actually pretty correlated with being Good (and also, that once you actually solve the technical problems, it doesn’t take that much Goodness to do the right thing for the collective and not just yourself).
Or, another way of looking at it, I find Scott Aaronson’s perspective convincing, when it is applied to humans. I just don’t think it will apply at all to the first kinds of AIs that people are actually likely to build, for technical reasons.
I think there are way more transhumanists and post-humanists at AGI labs than you imagine. Richard Sutton is a famous example (btw, I’ve just discovered that he moved from DeepMind to Keen Technologies, John Carmack’s venture), but I believe there are many more of them, but they disguise themselves for political reasons.
No. You have simplistic and incorrect beliefs about control.
If there are a bunch of companies (Deepmind, Anthropic, Meta, OpenAI, …) and a bunch of regulation efforts and politicians who all get inputs, then the AI researchers will have very little control authority, as little perhaps as the physicists had over the use of the H-bomb.
Where does the control really reside in this system?
Who made the decision to almost launch a nuclear torpedo in the Cuban Missile Crisis?
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do. They were all aligned around trying to build an object that would create the most powerful explosion possible.
As for who had control over the launch button, of course the physicists didn’t have that, and never expected to. But they also weren’t forced to work on the bomb; they did so voluntarily and knowing they wouldn’t be the ones who got any say in whether and how it would be used.
Another difference between an atomic bomb and AI is that the bomb itself had no say in how it was used. Once a superintelligence is turned on, control of the system rests entirely with the superintelligence and not with any humans. I strongly expect that researchers at big labs will not be forced to program an ASI to do bad things against the researchers’ own will, and I trust them not to do so voluntarily. (Again, all in the probably-counterfactual world where they know and understand all the consequences of their own actions.)
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do.
In that they wanted the bomb to explode? I think the analogous level of control for AI would be unsatisfactory.
they did so voluntarily and knowing they wouldn’t be the ones who got any say in whether and how it would be used.
I’m not sure they thought this; I think many expected that by playing along they would have influence later. Tech workers today often seem to care a lot about how products made by their companies are deployed.
In that they wanted the bomb to explode? I think the analogous level of control for AI would be unsatisfactory.
The premise of this hypothetical is that all the technical problems are solved—if an AI lab wants to build an AI to pursue the collective CEV of humanity or whatever, they can just get it to do that. Maybe they’ll settle on something other than CEV that is a bit better or worse or just different, but my point was that I don’t expect them to choose something ridiculous like “our CEO becomes god-emperor forever” or whatever.
I’m not sure they thought this; I think many expected that by playing along they would have influence later. Tech workers today often seem to care a lot about how products made by their companies are deployed.
Yeah, I was probably glossing over the actual history a bit too much; most of my knowledge on this comes from seeing Oppenheimer recently. The actual dis-analogy is that no AI researcher would really be arguing for not building and deploying ASI in this scenario, vs. with the atomic bomb where lots of people wanted to build it to have around, but not actually use it or only use it as some kind of absolute last resort. I don’t think many AI researchers in our actual reality have that kind of view on ASI, and probably few to none would have that view in the counterfactual where the technical problems are solved.
researchers at big labs will not be forced to program an ASI to do bad things against the researchers’ own will
Well these systems aren’t programmed. Researchers work on architecture and engineering, goal content is down to the RLHF that is applied and the wishes of the user(s), and the wishes of the user(s) are determined by market forces, user preferences, etc. And user preferences may themselves be influenced by other AI systems.
Closed source models can have RLHF and be delivered via an API, but open source models will not be far behind at any given point in time. And of course prompt injection attacks can bypass the RLHF on even closed source models.
The decisions about what RLHF to apply on contentious topics will come from politicians and from the leadership of the companies, not from the researchers. And politicians are influenced by the media and elections, and company leadership is influenced by the market and by cultural trends.
Where does the chain of control ultimately ground itself?
Answer: it doesn’t. Control of AI in the current paradigm is floating. Various players can influence it, but there’s no single source of truth for “what’s the AI’s goal”.
I don’t dispute any of that, but I also don’t think RLHF is a workable method for building or aligning a powerful AGI.
Zooming out, my original point was that there are two problems humanity is facing, quite different in character but both very difficult:
a coordination / governance problem, around deciding when to build AGI and who gets to build it
a technical problem, around figuring out how to build an AGI that does what the builder wants at all.
My view is that we are currently on track to solve neither of those problems. But if you actually consider what the world in which we sufficiently-completely solve even of them looks like, it seems like either is sufficient for a relatively high probability of a relatively good outcome, compared to where we are now.
Both possible worlds are probably weird hypotheticals which shouldn’t have an impact on what our actual strategy in the world we actually live in should be, which is of course to pursue solutions to both problems simultaneously with as much vigor as possible. But it still seems worth keeping in mind that if even one thing works out sufficiently well, we probably won’t be totally doomed.
I think the theory is something like the following: We build the guaranteed trustworthy AI, and ask it to prevent the creation of unaligned AI, and it comes up with the necessary governance structures, and the persuasion and force needed to implement them.
I’m not sure this is a certain argument. Some political actions are simply impossible to accomplish ethically, and therefore unavailable to a “good” actor even given superhuman abilities.
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do. They were all aligned around trying to build an object that would create the most powerful explosion possible.
Where did you learn of this?
From what I know it was the opposite, there were so many disagreements, even just among the physicists, that they decided to duplicate nearly all effort to produce two different types of nuclear device designs, the gun type and the implosion type, simultaneously.
e.g. both plutonium and uranium processing supply chains were set up at massive expense, and later environmental damage, just in case one design didn’t work.
Without commenting on whether there was in fact much agreement or disagreement among the physicists, this doesn’t sound like much evidence of disagreement. I think it’s often entirely reasonable to try two technical approaches simultaneously, even if everyone agrees that one of them is more promising.
Ah, I see. Yeah, that’s a reasonable worry. Any ideas on how someone in those orgs could incentivize such behavior whilst discouraging poorly thought out pivotal acts? I would be OK with a future where e.g. OAI gets 90-99% of the cosmic endowment as long as the rest of us get a chunk, or get the chance to safely grow to the point where we have a shot at the vast scraps OAI leaves behind.
Ah, I see. Yeah, that’s a reasonable worry. Any ideas on how someone in those orgs could incentivize such behavior whilst discouraging poorly thought out pivotal acts?
the fact that we are having this conversation simply underscores how dangerous this is and how unprepared we are.
This is the future of the universe we’re talking about. It shouldn’t be a footnote!
Do we really need both? It seems like either a technical solution OR competent global governance would mostly suffice.
Do we need both? Perhaps not, in the theoretical case where we get a perfect instance of one. I disagree that we should aim for one or the other, because I don’t expect we will reach anywhere near perfection on either. I think we should expect to have to muddle through somehow with very imperfect versions of each.
I think we’ll likely see some janky poorly-organized international AI governance attempt combined with just good enough tool AI and software and just-aligned-enough sorta-general AI to maintain an uneasy temporary state of suppressing rogue AI explosions.
How long will we manage to stay on top under such circumstances? Hopefully long enough to realize the danger we’re in and scrape together some better governance and alignment solutions.
Edit: I later saw that Max H said he thought we should pursue both. So we disagree less than I thought. There is some difference, in that I still think we can’t really afford a failure in either category. Mainly because I don’t expect us to do well enough in either for that single semi-success to carry us through.
Do we really need both? It seems like either a technical solution OR competent global governance would mostly suffice.
Actually-competent global governance should be able to coordinate around just not building AGI (and preventing anyone else from building it) indefinitely. If we could solve a coordination problem on that scale, we could also probably solve a bunch of other mundane coordination problems, governance issues, unrelated x-risks, etc., resulting in a massive boost to global prosperity and happiness through non-AI technological progress and good policy.
Conversely, if we had a complete technical solution, I don’t see why we necessarily need that much governance competence. Even if takeoff turns out to be relatively slow, the people initially building and controlling AGI will probably be mostly researchers in big labs.
Maybe ideally we would want a “long reflection” of some kind, but in the probably-counterfactual world where these researchers actually get exactly what they aim for, I mostly trust them to aim the AI at something like “fill the galaxy with humanity’s collective coherent extrapolated volition”, and that seems good enough in a pinch / hurry, if it actually works.
Without governance you’re stuck trusting that the lead researcher (or whoever is in control) turns down near infinite power and instead act selflessly. That seems like quite the gamble.
I don’t think it’s such a stark choice. I think odds are the lead researcher takes the infinite power, and it turns out okay to great. Corrigibility seems like the safest outer alignment plan, and it’s got to be corrigible to some set of people in particular. I think giving one random person near infinite power will work out way better than intuition suggests. I think it’s not power that corrupts, but rather the pursuit of power. I think unlimited power will lead to an ordinary, non-sociopathic person to progressively focus more on their empathy for others. I think they’ll ultimately use that power to let others do whatever they want that doesn’t take away others’ freedom to do what they want. And that’s the best outer alignment result, in my opinioin.
Alexander Wales in the end of his series ‘Worth the Candle’ does a lovely job of laying out what a genuinely kind person given omnipotence could do to make the world a nice place for everyone. It’s a lovely vision, but I think relying on this in practice seems a lot less trustworthy to me than having a bureaucratic process with checks & balances in charge. I mean, I still think it’ll ultimately have to be some relatively small team in charge of a model corrigible to them, if we’re in a singleton scenario. I have a lot more faith in ‘small team with bureaucratic oversight’ than some individual tech bro selected semi-randomly from the set of researchers at big AI labs who might be presented with the opportunity to ‘get the jump’ on everyone else.
I’m curious why you trust a small group of government bros a lot more than one tech bro. I wouldn’t strongly prefer either, but I’d prefer Sam Altman or Demis Hassabis to a randomly chosen bureaucrat. I don’t totally trust those guys, but I think it’s pretty likely they’re not total sociopaths or idiots.
By the opportunity to get the jump on everyone else, do you mean beating other companies to AGI, or becoming the one guy your AGI takes orders from?
I meant stealing control of an AGI within the company before the rest of the company catches on. I don’t necessarily mean that I’d not want Sam or Demis involved in the ruling council, just that I’d prefer if there was like… an assigned group of people to directly operate the model, and an oversight committee with reporting rules reporting to a larger public audience. Regulations and structure, rather than the whims of one person.
As I said in the article, technically controllable ASIs are the equivalent of an invasive species which will displace humans from Earth politically, economically and militarily.
And I’m saying that, assuming all the technical problems are solved, AI researchers would be the ones in control, and I (mostly) trust them to just not do things like build an AI that acts like an invasive species, or argues for its own rights, or build something that actually deserves such rights.
Maybe some random sociologists on Twitter will call for giving AIs rights, but in the counterfactual world where AI researchers have fine control of their own creations, I expect no one in a position to make decisions on the matter to give such calls any weight.
Even in the world we actually live in, I expect such calls to have little consequence. I do think some of the things you describe are reasonably likely to happen, but the people responsible for making them happen will do so unintentionally, with opinion columnists, government regulations, etc. playing little or no role in the causal process.
What is the basis of this trust? Anecdotal impressions of a few that you know personally in the space, opinion polling data, something else?
A bit of anecdotal impressions, yes, but mainly I just think that in humans being smart, conscientious, reflective, etc. enough to be the brightest researcher a big AI lab is actually pretty correlated with being Good (and also, that once you actually solve the technical problems, it doesn’t take that much Goodness to do the right thing for the collective and not just yourself).
Or, another way of looking at it, I find Scott Aaronson’s perspective convincing, when it is applied to humans. I just don’t think it will apply at all to the first kinds of AIs that people are actually likely to build, for technical reasons.
I think there are way more transhumanists and post-humanists at AGI labs than you imagine. Richard Sutton is a famous example (btw, I’ve just discovered that he moved from DeepMind to Keen Technologies, John Carmack’s venture), but I believe there are many more of them, but they disguise themselves for political reasons.
No. You have simplistic and incorrect beliefs about control.
If there are a bunch of companies (Deepmind, Anthropic, Meta, OpenAI, …) and a bunch of regulation efforts and politicians who all get inputs, then the AI researchers will have very little control authority, as little perhaps as the physicists had over the use of the H-bomb.
Where does the control really reside in this system?
Who made the decision to almost launch a nuclear torpedo in the Cuban Missile Crisis?
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do. They were all aligned around trying to build an object that would create the most powerful explosion possible.
As for who had control over the launch button, of course the physicists didn’t have that, and never expected to. But they also weren’t forced to work on the bomb; they did so voluntarily and knowing they wouldn’t be the ones who got any say in whether and how it would be used.
Another difference between an atomic bomb and AI is that the bomb itself had no say in how it was used. Once a superintelligence is turned on, control of the system rests entirely with the superintelligence and not with any humans. I strongly expect that researchers at big labs will not be forced to program an ASI to do bad things against the researchers’ own will, and I trust them not to do so voluntarily. (Again, all in the probably-counterfactual world where they know and understand all the consequences of their own actions.)
In that they wanted the bomb to explode? I think the analogous level of control for AI would be unsatisfactory.
I’m not sure they thought this; I think many expected that by playing along they would have influence later. Tech workers today often seem to care a lot about how products made by their companies are deployed.
The premise of this hypothetical is that all the technical problems are solved—if an AI lab wants to build an AI to pursue the collective CEV of humanity or whatever, they can just get it to do that. Maybe they’ll settle on something other than CEV that is a bit better or worse or just different, but my point was that I don’t expect them to choose something ridiculous like “our CEO becomes god-emperor forever” or whatever.
Yeah, I was probably glossing over the actual history a bit too much; most of my knowledge on this comes from seeing Oppenheimer recently. The actual dis-analogy is that no AI researcher would really be arguing for not building and deploying ASI in this scenario, vs. with the atomic bomb where lots of people wanted to build it to have around, but not actually use it or only use it as some kind of absolute last resort. I don’t think many AI researchers in our actual reality have that kind of view on ASI, and probably few to none would have that view in the counterfactual where the technical problems are solved.
Well these systems aren’t programmed. Researchers work on architecture and engineering, goal content is down to the RLHF that is applied and the wishes of the user(s), and the wishes of the user(s) are determined by market forces, user preferences, etc. And user preferences may themselves be influenced by other AI systems.
Closed source models can have RLHF and be delivered via an API, but open source models will not be far behind at any given point in time. And of course prompt injection attacks can bypass the RLHF on even closed source models.
The decisions about what RLHF to apply on contentious topics will come from politicians and from the leadership of the companies, not from the researchers. And politicians are influenced by the media and elections, and company leadership is influenced by the market and by cultural trends.
Where does the chain of control ultimately ground itself?
Answer: it doesn’t. Control of AI in the current paradigm is floating. Various players can influence it, but there’s no single source of truth for “what’s the AI’s goal”.
I don’t dispute any of that, but I also don’t think RLHF is a workable method for building or aligning a powerful AGI.
Zooming out, my original point was that there are two problems humanity is facing, quite different in character but both very difficult:
a coordination / governance problem, around deciding when to build AGI and who gets to build it
a technical problem, around figuring out how to build an AGI that does what the builder wants at all.
My view is that we are currently on track to solve neither of those problems. But if you actually consider what the world in which we sufficiently-completely solve even of them looks like, it seems like either is sufficient for a relatively high probability of a relatively good outcome, compared to where we are now.
Both possible worlds are probably weird hypotheticals which shouldn’t have an impact on what our actual strategy in the world we actually live in should be, which is of course to pursue solutions to both problems simultaneously with as much vigor as possible. But it still seems worth keeping in mind that if even one thing works out sufficiently well, we probably won’t be totally doomed.
How does a solution to the above solve the coordination/governance problem?
I think the theory is something like the following: We build the guaranteed trustworthy AI, and ask it to prevent the creation of unaligned AI, and it comes up with the necessary governance structures, and the persuasion and force needed to implement them.
I’m not sure this is a certain argument. Some political actions are simply impossible to accomplish ethically, and therefore unavailable to a “good” actor even given superhuman abilities.
Where did you learn of this?
From what I know it was the opposite, there were so many disagreements, even just among the physicists, that they decided to duplicate nearly all effort to produce two different types of nuclear device designs, the gun type and the implosion type, simultaneously.
e.g. both plutonium and uranium processing supply chains were set up at massive expense, and later environmental damage, just in case one design didn’t work.
Without commenting on whether there was in fact much agreement or disagreement among the physicists, this doesn’t sound like much evidence of disagreement. I think it’s often entirely reasonable to try two technical approaches simultaneously, even if everyone agrees that one of them is more promising.
You do realize setting up each supply chain alone took up well over 1% of total US GDP right?
I didn’t know that, but not a crux. This information does not make me think it was obviously unreasonable to try both approaches simultaneously.
(Downvoted for tone.)
How does this relate to the discussion Max H and Roko were having? Or the question I asked of Max H?
I don’t know, I didn’t intend it to relate to those things. It was a narrow reply to something in your comment, and I attempted to signal it as such.
(I’m not very invested in this conversation and currently intend to reply at most twice more.)
Okay then.
So you don’t think a pivotal act exists? Or, more amitiously, you don’t think a sovereign implementing CEV would result in a good enough world?
Who is going to implement CEV or some other pivotal act?
Ah, I see. Yeah, that’s a reasonable worry. Any ideas on how someone in those orgs could incentivize such behavior whilst discouraging poorly thought out pivotal acts? I would be OK with a future where e.g. OAI gets 90-99% of the cosmic endowment as long as the rest of us get a chunk, or get the chance to safely grow to the point where we have a shot at the vast scraps OAI leaves behind.
the fact that we are having this conversation simply underscores how dangerous this is and how unprepared we are.
This is the future of the universe we’re talking about. It shouldn’t be a footnote!
Do we need both? Perhaps not, in the theoretical case where we get a perfect instance of one. I disagree that we should aim for one or the other, because I don’t expect we will reach anywhere near perfection on either. I think we should expect to have to muddle through somehow with very imperfect versions of each.
I think we’ll likely see some janky poorly-organized international AI governance attempt combined with just good enough tool AI and software and just-aligned-enough sorta-general AI to maintain an uneasy temporary state of suppressing rogue AI explosions.
How long will we manage to stay on top under such circumstances? Hopefully long enough to realize the danger we’re in and scrape together some better governance and alignment solutions.
Edit: I later saw that Max H said he thought we should pursue both. So we disagree less than I thought. There is some difference, in that I still think we can’t really afford a failure in either category. Mainly because I don’t expect us to do well enough in either for that single semi-success to carry us through.