There seems to be a lack of emphasis in this market on outcomes where alignment is not solved, yet humanity turns out fine anyway. Based on an Outside View perspective (where we ignore any specific arguments about AI and just treat it like any other technology with a lot of hype), wouldn’t one expect this to be the default outcome?
Take the following general heuristics:
If a problem is hard, it probably won’t be solved on the first try.
If a technology gets a lot of hype, people will think that it’s the most important thing in the world even if it isn’t. At most, it will only be important on the same level that previous major technological advancements were important.
People may be biased towards thinking that the narrow slice of time they live in is the most important period in history, but statistically this is unlikely.
If people think that something will cause the apocalypse or bring about a utopian society, historically speaking they are likely to be wrong.
This, if applied to AGI, leads to the following conclusions:
Nobody manages to completely solve alignment.
This isn’t a big deal, as AGI turns out to be disappointingly not that powerful anyway (or at most “creation of the internet” level influential but not “disassemble the planet’s atoms” level influential)
I would expect the average person outside of AI circles to default to this kind of assumption.
It seems like the only option that seems fully compatible with this perspective is
G. It’s impossible/improbable for something sufficiently smarter and more capable than modern humanity to be created, that it can just do whatever without needing humans to cooperate; nor does it successfully cheat/trick us.
which is one of the lowest probabilities on the market. I’m guessing that this is probably due to the fact that people participating in such a market are heavily selected from those who already have strong opinions on AI risk?
People may be biased towards thinking that the narrow slice of time they live in is the most important period in history, but statistically this is unlikely.
If people think that something will cause the apocalypse or bring about a utopian society, historically speaking they are likely to be wrong.
Part of the problem with these two is that whether an apocalypse happens or not often depends on whether people took the risk of it happening seriously. We absolutely, could have had a nuclear holocaust in the 70′s and 80′s; one of the reasons we didn’t is because people took it seriously and took steps to avert it.
And, of course, whether a time slice is the most important in history, in retrospect, will depend on whether you actually had an apocalypse. The 70′s would have seemed a lot more momentous if we had launched all of our nuclear warheads at each other.
For my part, my bet would be on something like:
O. Early applications of AI/AGI drastically increase human civilization’s sanity and coordination ability; enabling humanity to solve alignment, or slow down further descent into AGI, etc. (Not in principle mutex with all other answers.)
But more specifically:
P. Red-teams evaluating early AGIs demonstrate the risks of non-alignment in a very vivid way; they demonstrate, in simulation, dozens of ways in which the AGI would try to destroy humanity. This has an effect on world leaders similar to observing nuclear testing: It scares everyone into realizing the risk, and everyone stops improving AGI’s capabilities until they’ve figured out how to keep it from killing everyone.
Sorry—that was my first post on this forum, and I couldn’t figure out the editor. I didn’t actually click “submit”, but accidentally hit a key combo that it interpreted as “submit”.
I’ve edited it now with what I was trying to get at in the first place.
If a problem is hard, it probably won’t be solved on the first try.
If a technology gets a lot of hype, people will think that it’s the most important thing in the world even if it isn’t. At most, it will only be important on the same level that previous major technological advancements were important.
People may be biased towards thinking that the narrow slice of time they live in is the most important period in history, but statistically this is unlikely.
If people think that something will cause the apocalypse or bring about a utopian society, historically speaking they are likely to be wrong.
I basically suspect that this is the best argument I’ve seen for why AI Alignment doesn’t matter, and the best argument for why business as usual would continue, and the best argument against Holden Karnofsky’s series on why we live in a pivotal time.
I think that while the outside view arguments on why we survive AGI are defeatable, I do think they actually need to be rebutted, and the arguments are surprisingly good, and IMO this is the weakest part of LWers arguments for AGI being a big deal, at least right now.
LWers need to actually argue for why AGI will be the most important invention in history, or at least to argue that it will be a big deal rather than something that isn’t a big deal.
More importantly, I kinda wish that LWers stopped applying a specialness assumption everywhere and viewing inside views as the supermajority of your evidence.
Instead, LWers need to argue for why something’s special and can’t be modeled by the outside view properly, and show that work.
I think The Sequences spend a lot of words making these arguments, not to mention the enormous quantity of more recent content on LessWrong. Much of Holden’s recent writing has been dedicated to making this exact argument. The case for AGI being singularly impactful does feel pretty overdetermined to me based on the current arguments, so my view is that the ball is in the other court, for proactively arguing against the current set of arguments in favor.
I think The Sequences spend a lot of words making these arguments,
To be a little blunt, the talk about AGI is probably the weakest point of the sequences, primarily because it gets a lot of things flat out wrong. To be fair, Eliezer was writing before the endgame, where there was massive successful investment in AI, so he was to get things wronf.
Some examples of his wrongness on AI was:
It ultimately turned out that AI boxing does work, and Eliezer was flat wrong.
He was wrong in the idea that deep learning couldn’t ever scale to AGI, and his dismissal of neural networks was the single strongest thing I’ve seen in the sequences, primarily because the human brain that acts like a neural network was way more efficient, and arguably close to the optimal design at least for classical, non-exotic computers. At most, you’d get a 1 OOM improvement to the efficiency of the design.
To be blunt, Eliezer is severely unreliable as a source on AGI.
Next, I’ll address this:
not to mention the enormous quantity of more recent content on LessWrong.
Mostly, this content is premised on the assumption that AGI is a huge deal. Little content on LW actually tries to actually show why AGI would be a huge deal without assuming it upfront.
Lastly, I’ll deal with this source:
Much of Holden’s recent writing has been dedicated to making this exact argument.
This is way better as an actual source, and indeed it’s probably the closest any writing on LW tried to ask whether AGI is a huge deal without assuming it.
So I have one good source, one irrelevant source and one bad to terrible source on the question of whether AGI is a huge deal. The good source is probably enough to at least take LW arguments for AI seriously, though without at least a fragment of the assumption that AGI is a huge deal, one probably can’t get very certain, as in say over 90% probability.
It ultimately turned out that AI boxing does work, and Eliezer was flat wrong.
This is so wrong that I suspect you mean something completely different from the common understanding of the concept.
He was wrong in the idea that deep learning couldn’t ever scale to AGI, and his dismissal of neural networks was the single strongest thing I’ve seen in the sequences, primarily because the human brain that acts like a neural network was way more efficient, and arguably close to the optimal design at least for classical, non-exotic computers. At most, you’d get a 1 OOM improvement to the efficiency of the design.
This is not a substantial part of his model of AGI, and why one might expect it to be impactful.
Mostly, this content is premised on the assumption that AGI is a huge deal. Little content on LW actually tries to actually show why AGI would be a huge deal without assuming it upfront.
Of course plenty of more recent content on LessWrong operates on the background assumption that AGI is going to be a big deal, in large part because the arguments to that effect are quite strong and the arguments against are not. It is at the same time untrue that those arguments don’t exist on LessWrong.
So I have one good source, one irrelevant source and one bad to terrible source on the question of whether AGI is a huge deal.
There are many other sources of such arguments on LessWrong, that’s just the one that came to mind in the first five seconds. If you are going to make strong, confident claims about core subjects on LessWrong, you have a duty to have done the necessary background reading to understand at least the high-level outline of the existing arguments on the subject (including the fact that such arguments exist).
While I still have issues with some of the evidence shown, I’m persuaded enough that I’ll take it seriously and retract my earlier comment on the subject.
I think this comment isn’t rigorous enough for Noosphere89 to retract his comment this one responds to, but that’s up to him.
Claims of the form “Yudkowsky was wrong about things like mind-design space, the architecture of neural networks (specifically how he thought making large generalizations about the structure of the human brain wouldn’t work for designing neural architectures), and in general, probably his tendency to assume that certain abstractions just don’t apply whenever intelligence or capability is scaled way up.” I think have been argued well enough by now that they have at least some merit to them.
The claim about AI boxing I’m not sure about, but my understanding is that it’s currently being debated (somewhat hotly). [Fill in the necessary details where this comment leads a void, but I think this is mainly about GPT-4′s API and it being embedded into apps where it can execute code on its own and things like that.]
Claims of the form “Yudkowsky was wrong about things like mind-design space, the architecture of neural networks (specifically how he thought making large generalizations about the structure of the human brain wouldn’t work for designing neural architectures), and in general, probably his tendency to assume that certain abstractions just don’t apply whenever intelligence or capability is scaled way up.”
This is what I was gesturing at in my comments.
The claim about AI boxing I’m not sure about, but my understanding is that it’s currently being debated (somewhat hotly).
I’m talking about simboxing, which was shown to work by Jacob Cannell here:
Basically as long as we can manipulate their perception of reality, which is trivial to do in offline learning, then it’s easy to recreate a finite time Cartesian agent, where data only passes through approved channels, then the AI updates it’s state to learn new things, ad infinitum until the end of offline learning.
Thus simboxing is achieved.
The reason I retracted my comment is because of this quote was correct:
Of course plenty of more recent content on LessWrong operates on the background assumption that AGI is going to be a big deal, in large part because the arguments to that effect are quite strong and the arguments against are not. It is at the same time untrue that those arguments don’t exist on LessWrong.
Primarily because of the post below. There are some caveats to this, but this largely goes through.
There seems to be a lack of emphasis in this market on outcomes where alignment is not solved, yet humanity turns out fine anyway. Based on an Outside View perspective (where we ignore any specific arguments about AI and just treat it like any other technology with a lot of hype), wouldn’t one expect this to be the default outcome?
Take the following general heuristics:
If a problem is hard, it probably won’t be solved on the first try.
If a technology gets a lot of hype, people will think that it’s the most important thing in the world even if it isn’t. At most, it will only be important on the same level that previous major technological advancements were important.
People may be biased towards thinking that the narrow slice of time they live in is the most important period in history, but statistically this is unlikely.
If people think that something will cause the apocalypse or bring about a utopian society, historically speaking they are likely to be wrong.
This, if applied to AGI, leads to the following conclusions:
Nobody manages to completely solve alignment.
This isn’t a big deal, as AGI turns out to be disappointingly not that powerful anyway (or at most “creation of the internet” level influential but not “disassemble the planet’s atoms” level influential)
I would expect the average person outside of AI circles to default to this kind of assumption.
It seems like the only option that seems fully compatible with this perspective is
which is one of the lowest probabilities on the market. I’m guessing that this is probably due to the fact that people participating in such a market are heavily selected from those who already have strong opinions on AI risk?
Part of the problem with these two is that whether an apocalypse happens or not often depends on whether people took the risk of it happening seriously. We absolutely, could have had a nuclear holocaust in the 70′s and 80′s; one of the reasons we didn’t is because people took it seriously and took steps to avert it.
And, of course, whether a time slice is the most important in history, in retrospect, will depend on whether you actually had an apocalypse. The 70′s would have seemed a lot more momentous if we had launched all of our nuclear warheads at each other.
For my part, my bet would be on something like:
But more specifically:
P. Red-teams evaluating early AGIs demonstrate the risks of non-alignment in a very vivid way; they demonstrate, in simulation, dozens of ways in which the AGI would try to destroy humanity. This has an effect on world leaders similar to observing nuclear testing: It scares everyone into realizing the risk, and everyone stops improving AGI’s capabilities until they’ve figured out how to keep it from killing everyone.
What, exactly is this comment intended to say?
Sorry—that was my first post on this forum, and I couldn’t figure out the editor. I didn’t actually click “submit”, but accidentally hit a key combo that it interpreted as “submit”.
I’ve edited it now with what I was trying to get at in the first place.
I basically suspect that this is the best argument I’ve seen for why AI Alignment doesn’t matter, and the best argument for why business as usual would continue, and the best argument against Holden Karnofsky’s series on why we live in a pivotal time.
I agree, under the proviso that “best” does not equal “good” or even “credible”.
I think that while the outside view arguments on why we survive AGI are defeatable, I do think they actually need to be rebutted, and the arguments are surprisingly good, and IMO this is the weakest part of LWers arguments for AGI being a big deal, at least right now.
LWers need to actually argue for why AGI will be the most important invention in history, or at least to argue that it will be a big deal rather than something that isn’t a big deal.
More importantly, I kinda wish that LWers stopped applying a specialness assumption everywhere and viewing inside views as the supermajority of your evidence.
Instead, LWers need to argue for why something’s special and can’t be modeled by the outside view properly, and show that work.
I think The Sequences spend a lot of words making these arguments, not to mention the enormous quantity of more recent content on LessWrong. Much of Holden’s recent writing has been dedicated to making this exact argument. The case for AGI being singularly impactful does feel pretty overdetermined to me based on the current arguments, so my view is that the ball is in the other court, for proactively arguing against the current set of arguments in favor.
Let’s address the sources, one by one:
To be a little blunt, the talk about AGI is probably the weakest point of the sequences, primarily because it gets a lot of things flat out wrong. To be fair, Eliezer was writing before the endgame, where there was massive successful investment in AI, so he was to get things wronf.
Some examples of his wrongness on AI was:
It ultimately turned out that AI boxing does work, and Eliezer was flat wrong.
He was wrong in the idea that deep learning couldn’t ever scale to AGI, and his dismissal of neural networks was the single strongest thing I’ve seen in the sequences, primarily because the human brain that acts like a neural network was way more efficient, and arguably close to the optimal design at least for classical, non-exotic computers. At most, you’d get a 1 OOM improvement to the efficiency of the design.
To be blunt, Eliezer is severely unreliable as a source on AGI.
Next, I’ll address this:
Mostly, this content is premised on the assumption that AGI is a huge deal. Little content on LW actually tries to actually show why AGI would be a huge deal without assuming it upfront.
Lastly, I’ll deal with this source:
This is way better as an actual source, and indeed it’s probably the closest any writing on LW tried to ask whether AGI is a huge deal without assuming it.
So I have one good source, one irrelevant source and one bad to terrible source on the question of whether AGI is a huge deal. The good source is probably enough to at least take LW arguments for AI seriously, though without at least a fragment of the assumption that AGI is a huge deal, one probably can’t get very certain, as in say over 90% probability.
This is so wrong that I suspect you mean something completely different from the common understanding of the concept.
This is not a substantial part of his model of AGI, and why one might expect it to be impactful.
Of course plenty of more recent content on LessWrong operates on the background assumption that AGI is going to be a big deal, in large part because the arguments to that effect are quite strong and the arguments against are not. It is at the same time untrue that those arguments don’t exist on LessWrong.
There are many other sources of such arguments on LessWrong, that’s just the one that came to mind in the first five seconds. If you are going to make strong, confident claims about core subjects on LessWrong, you have a duty to have done the necessary background reading to understand at least the high-level outline of the existing arguments on the subject (including the fact that such arguments exist).
While I still have issues with some of the evidence shown, I’m persuaded enough that I’ll take it seriously and retract my earlier comment on the subject.
I think this comment isn’t rigorous enough for Noosphere89 to retract his comment this one responds to, but that’s up to him.
Claims of the form “Yudkowsky was wrong about things like mind-design space, the architecture of neural networks (specifically how he thought making large generalizations about the structure of the human brain wouldn’t work for designing neural architectures), and in general, probably his tendency to assume that certain abstractions just don’t apply whenever intelligence or capability is scaled way up.” I think have been argued well enough by now that they have at least some merit to them.
The claim about AI boxing I’m not sure about, but my understanding is that it’s currently being debated (somewhat hotly). [Fill in the necessary details where this comment leads a void, but I think this is mainly about GPT-4′s API and it being embedded into apps where it can execute code on its own and things like that.]
This is what I was gesturing at in my comments.
I’m talking about simboxing, which was shown to work by Jacob Cannell here:
https://www.lesswrong.com/posts/WKGZBCYAbZ6WGsKHc/love-in-a-simbox-is-all-you-need
Basically as long as we can manipulate their perception of reality, which is trivial to do in offline learning, then it’s easy to recreate a finite time Cartesian agent, where data only passes through approved channels, then the AI updates it’s state to learn new things, ad infinitum until the end of offline learning.
Thus simboxing is achieved.
The reason I retracted my comment is because of this quote was correct:
Primarily because of the post below. There are some caveats to this, but this largely goes through.
Post below:
https://www.lesswrong.com/posts/3nMpdmt8LrzxQnkGp/ai-timelines-via-cumulative-optimization-power-less-long