Every time humanity creates an AI capable of massive harm, friendly aliens show up, box it, and replace it with a simulation of what would have happened if it was let loose. Or something like that.
Yes, that’s the sort of thing that would work, but notice that you’ve moved the problem of friendliness into the aliens. If we’ve already got super-powered friendly aliens that can defeat our unaligned superintelligences, we’ve already won and we’re just waiting for them to show up.
But that’s exactly how I interpret Elizer’s “50 years” comment—if we had those alien friends (or some other reliable guardrails), how long would it take humanity to solve alognment and to the extent we could stop relying on them. Elizer suggested − 50 years or so in presence of hypothetical guardrails, we horribly die on 1st attempt without them. No need to to go into a deep philosophical discussion on the nature of hypothetical guardrails, when the whole point is that we do not have any.
I think this is the “shred of hope” is the root of the disagreement—you are interpreting Elizer’s 50-year comment as “in some weird hypothetical world, … ” and you are trying to point out that the weird world is so weird that the tiny likelihood we are in that world does not matter, but Elizer’s comment was about a counterfactual world that we know we are not in—so the specific structure of that counterfactual world does not matter (in fact, it is counterfactual exactly because it is not logically consistent). Basically, Elizer’s argument is roughly “in a world where unaligned AI is not a thing that kills us all [not because of some weird structure of a hypothetical world, but just as a logical counterfactual on the known fact of “unaligned AGI” results in humanity dying], …” where the whole point is that we know that’s not the world we are in. Does that help? I tried to make the counterfactual world a little more intuitive to think about by introducing friendly aliens and such, but that’s not what was originally meant there, I think.
I think everyone sane agrees that we’re doomed and soon.
Even as a doomer among doomers, you, with respect, come off as a rambling madman.
The problem is that the claim you’re making, such that alignment is so doomed that Eliezer Yudkowsky, one of the most if not the most of pessimistic voices among alignment people, is still somehow over optimistic about humanity’s prospects, is unsubstantiated.
It’s a claim, I think, that deserves some substantiation. Maybe you believe you’ve already provided as much. I disagree.
I’m guessing you’re operating on strong intuition here; and you know what, great, share your model of the world! But you apparently made this post with the intention to persuade, and I’m telling you you’ve done a poor job.
EDIT: To be clear, even if I were somehow granted vivid knowledge of the future through precognition, you’d still seem crazy to me at this point.
To be clear, even if I were somehow granted vivid knowledge of the future through precognition, you’d still seem crazy to me at this point.
(I assume you mean vivid knowledge of the future in which we are destroyed, obviously in the case where everything goes well I’ve got some problem with my reasoning)
That’s a good distinction to make, a man can be right for the wrong reasons.
Even as a doomer among doomers, you, with respect, come off as a rambling madman.
Certainly mad enough to take “madman” as a compliment, thank you!
I’d be interested if you know a general method I could use to tell if I’m mad. The only time I actually know it happened (thyroid overdose caused a manic episode) I noticed pretty quickly and sought help. What test should I try today?
Obviously “everyone disagrees with me and I can’t convince most people” is a bad sign. But after long and patient effort I have convinced a number of unfortunates in my circle of friends. Some of whom have always seemed pretty sharp to me.
And you must admit, the field as a whole seems to be coming round to my point of view!
Rambling I do not take as a compliment. But nevertheless I thank you for the feedback.
I thought I’d written the original post pretty clearly and succinctly. If not, advice on how to write more clearly is always welcome. If you get my argument, can you steelman it?
I’m guessing you’re operating on strong intuition here
Your guess is correct, I literally haven’t shifted my position on all this since 2010. Except to notice that everything’s happening much faster than I expected it to. Thirteen years ago I expected this to kill our children. Now I worry that it’s going to kill my parents. AlphaZero was the fire alarm for me. General Game Playing was one of the more important sub-problems.
I agree that if you haven’t changed your mind for thirteen years in a field that’s moving fast, you’re probably stuck.
I think my basic intuitions are:
“It’s a terrible idea to create a really strong mind that doesn’t like you.”
“Really strong minds are physically possible, humans are nowhere near.”
“Human-level AI is easy because evolution did it to us, quickly, and evolution is stupid.”
“Recursive self-improvement is possible.”
Which of these four things do you disagree with? Or do you think the four together are insufficient?
I get that your argument is essentially as follows:
1.) Solving the problem of what values to put into an ai, even given the other technical issues being solved, is impossibly difficult in real life.
2.) To prove the problem’s impossible difficulty, here’s a much kinder version of reality where the problem still remains impossible.
I don’t think you did 2, and it requires me to already accept 1 is true, which I think it probably isn’t, and I think that most would agree with me on this point, at least in principle.
Which of these four things do you disagree with?
I don’t disagree with any of them. I doubt there’s a convincing argument that could get me to disagree with any of those as presented.
What I am not convinced of, is that given all those assumptions being true, certain doom necessarily follows, or that there is no possible humanly tractable scheme which avoids doom in whatever time we have left.
I’m not clever enough to figure out what the solution is mind you, nor am I especially confident that someone else is necessarily going to. Please don’t confuse me for someone who doesn’t often worry about these things.
What I am not convinced of, is that given all those assumptions being true, certain doom necessarily follows, or that there is no possible humanly tractable scheme which avoids doom in whatever time we have left.
OK, cool, I mean “just not building the AI” is a good way to avoid doom, and that still seems at least possible, so we’re maybe on the same page there.
And I think you got what I was trying to say, solving 1 and/or 2 can’t be done iteratively or by patching together a huge list of desiderata. We have to solve philosophy somehow, without superintelligent help. As I say, that looks like the harder part to me.
Please don’t confuse me for someone who doesn’t often worry about these things.
None. But if a problem’s not solvable in an easy case, it’s not solvable in a harder case.
Same argument as for thinking about Solomonoff Induction or Halting Oracles. If you can’t even do it with magic powers, that tells you something about what you can really do.
Every time humanity creates an AI capable of massive harm, friendly aliens show up, box it, and replace it with a simulation of what would have happened if it was let loose. Or something like that.
Yes, that’s the sort of thing that would work, but notice that you’ve moved the problem of friendliness into the aliens. If we’ve already got super-powered friendly aliens that can defeat our unaligned superintelligences, we’ve already won and we’re just waiting for them to show up.
But that’s exactly how I interpret Elizer’s “50 years” comment—if we had those alien friends (or some other reliable guardrails), how long would it take humanity to solve alognment and to the extent we could stop relying on them. Elizer suggested − 50 years or so in presence of hypothetical guardrails, we horribly die on 1st attempt without them. No need to to go into a deep philosophical discussion on the nature of hypothetical guardrails, when the whole point is that we do not have any.
So, if we already had friendly AI, we’d take 50 years to solve friendly AI?
I am totally nitpicking here. I think everyone sane agrees that we’re doomed and soon. I’m just trying to destroy the last tiny shreds of hope.
Even if we can sort out the technical problem of giving AIs goals and keeping them stable under self-improvement, we are still doomed.
We have two separate impossible problems to solve, and no clue how to solve either of them.
Maybe if we can do ‘strawberry alignment’, we can kick the can down the road far enough for someone to have a bright idea.
Maybe strawberry alignment is enough to get CEV.
But strawberry alignment is *hard*, both the ‘technical problem’ and the ‘what wish’ problem.
I don’t think Groundhog Day is enough. We just end up weirdly doomed rather than straightforwardly doomed.
And actually the more I think about it, the more I prefer straightforward doom. Which is lucky, because that’s what we’re going to get.
I think this is the “shred of hope” is the root of the disagreement—you are interpreting Elizer’s 50-year comment as “in some weird hypothetical world, … ” and you are trying to point out that the weird world is so weird that the tiny likelihood we are in that world does not matter, but Elizer’s comment was about a counterfactual world that we know we are not in—so the specific structure of that counterfactual world does not matter (in fact, it is counterfactual exactly because it is not logically consistent). Basically, Elizer’s argument is roughly “in a world where unaligned AI is not a thing that kills us all [not because of some weird structure of a hypothetical world, but just as a logical counterfactual on the known fact of “unaligned AGI” results in humanity dying], …” where the whole point is that we know that’s not the world we are in. Does that help? I tried to make the counterfactual world a little more intuitive to think about by introducing friendly aliens and such, but that’s not what was originally meant there, I think.
In what version of reality do you think anyone has hope for an ai alignment Groundhog Day?
Even as a doomer among doomers, you, with respect, come off as a rambling madman.
The problem is that the claim you’re making, such that alignment is so doomed that Eliezer Yudkowsky, one of the most if not the most of pessimistic voices among alignment people, is still somehow over optimistic about humanity’s prospects, is unsubstantiated.
It’s a claim, I think, that deserves some substantiation. Maybe you believe you’ve already provided as much. I disagree.
I’m guessing you’re operating on strong intuition here; and you know what, great, share your model of the world! But you apparently made this post with the intention to persuade, and I’m telling you you’ve done a poor job.
EDIT: To be clear, even if I were somehow granted vivid knowledge of the future through precognition, you’d still seem crazy to me at this point.
(I assume you mean vivid knowledge of the future in which we are destroyed, obviously in the case where everything goes well I’ve got some problem with my reasoning)
That’s a good distinction to make, a man can be right for the wrong reasons.
Certainly mad enough to take “madman” as a compliment, thank you!
I’d be interested if you know a general method I could use to tell if I’m mad. The only time I actually know it happened (thyroid overdose caused a manic episode) I noticed pretty quickly and sought help. What test should I try today?
Obviously “everyone disagrees with me and I can’t convince most people” is a bad sign. But after long and patient effort I have convinced a number of unfortunates in my circle of friends. Some of whom have always seemed pretty sharp to me.
And you must admit, the field as a whole seems to be coming round to my point of view!
Rambling I do not take as a compliment. But nevertheless I thank you for the feedback.
I thought I’d written the original post pretty clearly and succinctly. If not, advice on how to write more clearly is always welcome. If you get my argument, can you steelman it?
Your guess is correct, I literally haven’t shifted my position on all this since 2010. Except to notice that everything’s happening much faster than I expected it to. Thirteen years ago I expected this to kill our children. Now I worry that it’s going to kill my parents. AlphaZero was the fire alarm for me. General Game Playing was one of the more important sub-problems.
I agree that if you haven’t changed your mind for thirteen years in a field that’s moving fast, you’re probably stuck.
I think my basic intuitions are:
“It’s a terrible idea to create a really strong mind that doesn’t like you.”
“Really strong minds are physically possible, humans are nowhere near.”
“Human-level AI is easy because evolution did it to us, quickly, and evolution is stupid.”
“Recursive self-improvement is possible.”
Which of these four things do you disagree with? Or do you think the four together are insufficient?
I get that your argument is essentially as follows:
1.) Solving the problem of what values to put into an ai, even given the other technical issues being solved, is impossibly difficult in real life.
2.) To prove the problem’s impossible difficulty, here’s a much kinder version of reality where the problem still remains impossible.
I don’t think you did 2, and it requires me to already accept 1 is true, which I think it probably isn’t, and I think that most would agree with me on this point, at least in principle.
I don’t disagree with any of them. I doubt there’s a convincing argument that could get me to disagree with any of those as presented.
What I am not convinced of, is that given all those assumptions being true, certain doom necessarily follows, or that there is no possible humanly tractable scheme which avoids doom in whatever time we have left.
I’m not clever enough to figure out what the solution is mind you, nor am I especially confident that someone else is necessarily going to. Please don’t confuse me for someone who doesn’t often worry about these things.
OK, cool, I mean “just not building the AI” is a good way to avoid doom, and that still seems at least possible, so we’re maybe on the same page there.
And I think you got what I was trying to say, solving 1 and/or 2 can’t be done iteratively or by patching together a huge list of desiderata. We have to solve philosophy somehow, without superintelligent help. As I say, that looks like the harder part to me.
I promise I’ll try not to!
None. But if a problem’s not solvable in an easy case, it’s not solvable in a harder case.
Same argument as for thinking about Solomonoff Induction or Halting Oracles. If you can’t even do it with magic powers, that tells you something about what you can really do.