Draft in progress. Common failures modes for AI posts that I want to reference later:
Trying to help with AI Alignment
“Let’s make the AI not do anything.”
This is essentially a very expensive rock. Other people will be building AIs that do do stuff. How does your AI help the situation over not building anything at all?
“Let’s make the AI do [some specific thing that seems maybe helpful when parsed as an english sentence], without actually describing how to make sure they do exactly or even approximately that english sentence”
The problem is a) we don’t know how to point an AI at doing anything at all, and b) your simple english sentence includes a ton of hidden assumptions.
(Note: I think Mark Xu sort of disagreed with Oli on something related to this recently, so I don’t know that I consider this class of solution is completely settled. I think Mark Xu thinks that we don’t currently know how to get an AI to do moderately complicated actions with our current tech, but, our current paradigms for how to train AIs are likely to yield AIs that can do moderately complicated actions)
I think the typical new user who says things like this still isn’t advancing the current paradigm though, nor saying anything useful that hasn’t already been said.
Arguing Alignment is Doomed
[less well formulated]
Lately there’s been a crop of posts arguing alignment is doomed. I… don’t even strongly disagree with them, but they tend to be poorly argued and seem confused about what good problem solving looks like.
Arguing AI Risk is a dumb concern that doesn’t make sense
Lately (in particular since Eliezer’s TIME article), we’ve had a bunch of people coming in to say we’re a bunch of doomsday cultists and/or gish gallopers.
And, well, I think if you’re just tuning into the TIME article, or you have only been paying bits of attention over the years, I think this is a kinda reasonable belief-state to have. From the outside when you hear an extreme-sounding claim, it’s reasonable for alarm bells to go off and assume this is maybe crazy.
If you were the first person bringing up this concern, I’d be interested in your take. But, we’ve had a ton of these, so you’re not saying anything new by bringing it up.
You’re welcome to post your take somewhere else, but if you want to participate on LessWrong, you need to engage with the object level arguments.
Here’s a couple things I’ll say about this:
One particularly gishgallopy-feeling thing is that many arguments for AI catastrophe are disjunctive. So, yeah, there’s not just one argument you can overturn and then we’ll all be like “okay great, we can change our mind about this problem.” BUT, it is the case that we are pretty curious about individual arguments getting overturned. If individual disjunctive arguments turned out to be flawed, that’d make the problem easier. So I’d be fairly excited about someone who digs into the details of various claims in AGI Ruin: A List of Lethalities and either disproves them or finds a way around them.
Another potentially gishgallopy-feeling thing is If you’re discussing things on LessWrong, you’ll be expected to have absorbed the concepts from the sequences (such as how to think about subjective probability, tribalism, etc), either by reading the sequences or lurking a lot. I acknowledge this is as pretty gishgallopy at first glance, if you came here to debate one particular thing. Alas, that’s just how it is [see other FAQ question delving more into this]
Draft in progress. Common failures modes for AI posts that I want to reference later:
Trying to help with AI Alignment
“Let’s make the AI not do anything.”
This is essentially a very expensive rock. Other people will be building AIs that do do stuff. How does your AI help the situation over not building anything at all?
“Let’s make the AI do [some specific thing that seems maybe helpful when parsed as an english sentence], without actually describing how to make sure they do exactly or even approximately that english sentence”
The problem is a) we don’t know how to point an AI at doing anything at all, and b) your simple english sentence includes a ton of hidden assumptions.
(Note: I think Mark Xu sort of disagreed with Oli on something related to this recently, so I don’t know that I consider this class of solution is completely settled. I think Mark Xu thinks that we don’t currently know how to get an AI to do moderately complicated actions with our current tech, but, our current paradigms for how to train AIs are likely to yield AIs that can do moderately complicated actions)
I think the typical new user who says things like this still isn’t advancing the current paradigm though, nor saying anything useful that hasn’t already been said.
Arguing Alignment is Doomed
[less well formulated]
Lately there’s been a crop of posts arguing alignment is doomed. I… don’t even strongly disagree with them, but they tend to be poorly argued and seem confused about what good problem solving looks like.
Arguing AI Risk is a dumb concern that doesn’t make sense
Lately (in particular since Eliezer’s TIME article), we’ve had a bunch of people coming in to say we’re a bunch of doomsday cultists and/or gish gallopers.
And, well, I think if you’re just tuning into the TIME article, or you have only been paying bits of attention over the years, I think this is a kinda reasonable belief-state to have. From the outside when you hear an extreme-sounding claim, it’s reasonable for alarm bells to go off and assume this is maybe crazy.
If you were the first person bringing up this concern, I’d be interested in your take. But, we’ve had a ton of these, so you’re not saying anything new by bringing it up.
You’re welcome to post your take somewhere else, but if you want to participate on LessWrong, you need to engage with the object level arguments.
Here’s a couple things I’ll say about this:
One particularly gishgallopy-feeling thing is that many arguments for AI catastrophe are disjunctive. So, yeah, there’s not just one argument you can overturn and then we’ll all be like “okay great, we can change our mind about this problem.” BUT, it is the case that we are pretty curious about individual arguments getting overturned. If individual disjunctive arguments turned out to be flawed, that’d make the problem easier. So I’d be fairly excited about someone who digs into the details of various claims in AGI Ruin: A List of Lethalities and either disproves them or finds a way around them.
Another potentially gishgallopy-feeling thing is If you’re discussing things on LessWrong, you’ll be expected to have absorbed the concepts from the sequences (such as how to think about subjective probability, tribalism, etc), either by reading the sequences or lurking a lot. I acknowledge this is as pretty gishgallopy at first glance, if you came here to debate one particular thing. Alas, that’s just how it is [see other FAQ question delving more into this]