But suppose the reset leaves everyone with their memories intact, so we’ve got a chance to ‘learn from our mistakes’.
Then cool, we converge much more quickly to whatever condition satisfies the reset.
Under those conditions, even a fool like me could probably get something to work. Assuming that the answer will actually fit in my mind.
But what? Chances are I’m happy because I don’t realise that everyone else is dead and I’m living in a simulation which exists solely in order to avoid the reset.
What is the reset condition that avoids this sort of thing?
Asking about what reset conditions would avoid this is a bucket error—the rhetorical point is that no such reset is possible; he’s drawing a contrast between normal science and AGI science. I don’t understand why this post and your reply comment are attempting to get into the details of the counterfactual that you and EY both agree is a counterfactual. The whole point is that it can’t happen!
My point is that even if we solve the technical problem of ‘how do we get goals into an AI’, the ‘what values to put in the AI’ problem is also very hard.
So hard that even in the ‘Groundhog Day’ universe, it’s hard.
And yet people just handwave it.
Almost all the survival probability is in ‘we don’t build the AI’, or ‘we’re just wrong about something fundamental’.
Sure, I read that a few years after he wrote it, and it’s still probably the best idea, but even if it’s feasible it needs superintelligent help! So we have to solve the alignment problem to do it.
Every time humanity creates an AI capable of massive harm, friendly aliens show up, box it, and replace it with a simulation of what would have happened if it was let loose. Or something like that.
Yes, that’s the sort of thing that would work, but notice that you’ve moved the problem of friendliness into the aliens. If we’ve already got super-powered friendly aliens that can defeat our unaligned superintelligences, we’ve already won and we’re just waiting for them to show up.
But that’s exactly how I interpret Elizer’s “50 years” comment—if we had those alien friends (or some other reliable guardrails), how long would it take humanity to solve alognment and to the extent we could stop relying on them. Elizer suggested − 50 years or so in presence of hypothetical guardrails, we horribly die on 1st attempt without them. No need to to go into a deep philosophical discussion on the nature of hypothetical guardrails, when the whole point is that we do not have any.
I think this is the “shred of hope” is the root of the disagreement—you are interpreting Elizer’s 50-year comment as “in some weird hypothetical world, … ” and you are trying to point out that the weird world is so weird that the tiny likelihood we are in that world does not matter, but Elizer’s comment was about a counterfactual world that we know we are not in—so the specific structure of that counterfactual world does not matter (in fact, it is counterfactual exactly because it is not logically consistent). Basically, Elizer’s argument is roughly “in a world where unaligned AI is not a thing that kills us all [not because of some weird structure of a hypothetical world, but just as a logical counterfactual on the known fact of “unaligned AGI” results in humanity dying], …” where the whole point is that we know that’s not the world we are in. Does that help? I tried to make the counterfactual world a little more intuitive to think about by introducing friendly aliens and such, but that’s not what was originally meant there, I think.
I think everyone sane agrees that we’re doomed and soon.
Even as a doomer among doomers, you, with respect, come off as a rambling madman.
The problem is that the claim you’re making, such that alignment is so doomed that Eliezer Yudkowsky, one of the most if not the most of pessimistic voices among alignment people, is still somehow over optimistic about humanity’s prospects, is unsubstantiated.
It’s a claim, I think, that deserves some substantiation. Maybe you believe you’ve already provided as much. I disagree.
I’m guessing you’re operating on strong intuition here; and you know what, great, share your model of the world! But you apparently made this post with the intention to persuade, and I’m telling you you’ve done a poor job.
EDIT: To be clear, even if I were somehow granted vivid knowledge of the future through precognition, you’d still seem crazy to me at this point.
To be clear, even if I were somehow granted vivid knowledge of the future through precognition, you’d still seem crazy to me at this point.
(I assume you mean vivid knowledge of the future in which we are destroyed, obviously in the case where everything goes well I’ve got some problem with my reasoning)
That’s a good distinction to make, a man can be right for the wrong reasons.
Even as a doomer among doomers, you, with respect, come off as a rambling madman.
Certainly mad enough to take “madman” as a compliment, thank you!
I’d be interested if you know a general method I could use to tell if I’m mad. The only time I actually know it happened (thyroid overdose caused a manic episode) I noticed pretty quickly and sought help. What test should I try today?
Obviously “everyone disagrees with me and I can’t convince most people” is a bad sign. But after long and patient effort I have convinced a number of unfortunates in my circle of friends. Some of whom have always seemed pretty sharp to me.
And you must admit, the field as a whole seems to be coming round to my point of view!
Rambling I do not take as a compliment. But nevertheless I thank you for the feedback.
I thought I’d written the original post pretty clearly and succinctly. If not, advice on how to write more clearly is always welcome. If you get my argument, can you steelman it?
I’m guessing you’re operating on strong intuition here
Your guess is correct, I literally haven’t shifted my position on all this since 2010. Except to notice that everything’s happening much faster than I expected it to. Thirteen years ago I expected this to kill our children. Now I worry that it’s going to kill my parents. AlphaZero was the fire alarm for me. General Game Playing was one of the more important sub-problems.
I agree that if you haven’t changed your mind for thirteen years in a field that’s moving fast, you’re probably stuck.
I think my basic intuitions are:
“It’s a terrible idea to create a really strong mind that doesn’t like you.”
“Really strong minds are physically possible, humans are nowhere near.”
“Human-level AI is easy because evolution did it to us, quickly, and evolution is stupid.”
“Recursive self-improvement is possible.”
Which of these four things do you disagree with? Or do you think the four together are insufficient?
I get that your argument is essentially as follows:
1.) Solving the problem of what values to put into an ai, even given the other technical issues being solved, is impossibly difficult in real life.
2.) To prove the problem’s impossible difficulty, here’s a much kinder version of reality where the problem still remains impossible.
I don’t think you did 2, and it requires me to already accept 1 is true, which I think it probably isn’t, and I think that most would agree with me on this point, at least in principle.
Which of these four things do you disagree with?
I don’t disagree with any of them. I doubt there’s a convincing argument that could get me to disagree with any of those as presented.
What I am not convinced of, is that given all those assumptions being true, certain doom necessarily follows, or that there is no possible humanly tractable scheme which avoids doom in whatever time we have left.
I’m not clever enough to figure out what the solution is mind you, nor am I especially confident that someone else is necessarily going to. Please don’t confuse me for someone who doesn’t often worry about these things.
What I am not convinced of, is that given all those assumptions being true, certain doom necessarily follows, or that there is no possible humanly tractable scheme which avoids doom in whatever time we have left.
OK, cool, I mean “just not building the AI” is a good way to avoid doom, and that still seems at least possible, so we’re maybe on the same page there.
And I think you got what I was trying to say, solving 1 and/or 2 can’t be done iteratively or by patching together a huge list of desiderata. We have to solve philosophy somehow, without superintelligent help. As I say, that looks like the harder part to me.
Please don’t confuse me for someone who doesn’t often worry about these things.
None. But if a problem’s not solvable in an easy case, it’s not solvable in a harder case.
Same argument as for thinking about Solomonoff Induction or Halting Oracles. If you can’t even do it with magic powers, that tells you something about what you can really do.
We’re not disagreeing much!
But suppose the reset leaves everyone with their memories intact, so we’ve got a chance to ‘learn from our mistakes’.
Then cool, we converge much more quickly to whatever condition satisfies the reset.
Under those conditions, even a fool like me could probably get something to work. Assuming that the answer will actually fit in my mind.
But what? Chances are I’m happy because I don’t realise that everyone else is dead and I’m living in a simulation which exists solely in order to avoid the reset.
What is the reset condition that avoids this sort of thing?
Asking about what reset conditions would avoid this is a bucket error—the rhetorical point is that no such reset is possible; he’s drawing a contrast between normal science and AGI science. I don’t understand why this post and your reply comment are attempting to get into the details of the counterfactual that you and EY both agree is a counterfactual. The whole point is that it can’t happen!
My point is that even if we solve the technical problem of ‘how do we get goals into an AI’, the ‘what values to put in the AI’ problem is also very hard.
So hard that even in the ‘Groundhog Day’ universe, it’s hard.
And yet people just handwave it.
Almost all the survival probability is in ‘we don’t build the AI’, or ‘we’re just wrong about something fundamental’.
Yes, it is hard. But Eliezer isn’t just handwaving it. Here for example is a 37-page document he wrote on the subject 19 years ago:
https://intelligence.org/files/CEV.pdf
Sure, I read that a few years after he wrote it, and it’s still probably the best idea, but even if it’s feasible it needs superintelligent help! So we have to solve the alignment problem to do it.
Every time humanity creates an AI capable of massive harm, friendly aliens show up, box it, and replace it with a simulation of what would have happened if it was let loose. Or something like that.
Yes, that’s the sort of thing that would work, but notice that you’ve moved the problem of friendliness into the aliens. If we’ve already got super-powered friendly aliens that can defeat our unaligned superintelligences, we’ve already won and we’re just waiting for them to show up.
But that’s exactly how I interpret Elizer’s “50 years” comment—if we had those alien friends (or some other reliable guardrails), how long would it take humanity to solve alognment and to the extent we could stop relying on them. Elizer suggested − 50 years or so in presence of hypothetical guardrails, we horribly die on 1st attempt without them. No need to to go into a deep philosophical discussion on the nature of hypothetical guardrails, when the whole point is that we do not have any.
So, if we already had friendly AI, we’d take 50 years to solve friendly AI?
I am totally nitpicking here. I think everyone sane agrees that we’re doomed and soon. I’m just trying to destroy the last tiny shreds of hope.
Even if we can sort out the technical problem of giving AIs goals and keeping them stable under self-improvement, we are still doomed.
We have two separate impossible problems to solve, and no clue how to solve either of them.
Maybe if we can do ‘strawberry alignment’, we can kick the can down the road far enough for someone to have a bright idea.
Maybe strawberry alignment is enough to get CEV.
But strawberry alignment is *hard*, both the ‘technical problem’ and the ‘what wish’ problem.
I don’t think Groundhog Day is enough. We just end up weirdly doomed rather than straightforwardly doomed.
And actually the more I think about it, the more I prefer straightforward doom. Which is lucky, because that’s what we’re going to get.
I think this is the “shred of hope” is the root of the disagreement—you are interpreting Elizer’s 50-year comment as “in some weird hypothetical world, … ” and you are trying to point out that the weird world is so weird that the tiny likelihood we are in that world does not matter, but Elizer’s comment was about a counterfactual world that we know we are not in—so the specific structure of that counterfactual world does not matter (in fact, it is counterfactual exactly because it is not logically consistent). Basically, Elizer’s argument is roughly “in a world where unaligned AI is not a thing that kills us all [not because of some weird structure of a hypothetical world, but just as a logical counterfactual on the known fact of “unaligned AGI” results in humanity dying], …” where the whole point is that we know that’s not the world we are in. Does that help? I tried to make the counterfactual world a little more intuitive to think about by introducing friendly aliens and such, but that’s not what was originally meant there, I think.
In what version of reality do you think anyone has hope for an ai alignment Groundhog Day?
Even as a doomer among doomers, you, with respect, come off as a rambling madman.
The problem is that the claim you’re making, such that alignment is so doomed that Eliezer Yudkowsky, one of the most if not the most of pessimistic voices among alignment people, is still somehow over optimistic about humanity’s prospects, is unsubstantiated.
It’s a claim, I think, that deserves some substantiation. Maybe you believe you’ve already provided as much. I disagree.
I’m guessing you’re operating on strong intuition here; and you know what, great, share your model of the world! But you apparently made this post with the intention to persuade, and I’m telling you you’ve done a poor job.
EDIT: To be clear, even if I were somehow granted vivid knowledge of the future through precognition, you’d still seem crazy to me at this point.
(I assume you mean vivid knowledge of the future in which we are destroyed, obviously in the case where everything goes well I’ve got some problem with my reasoning)
That’s a good distinction to make, a man can be right for the wrong reasons.
Certainly mad enough to take “madman” as a compliment, thank you!
I’d be interested if you know a general method I could use to tell if I’m mad. The only time I actually know it happened (thyroid overdose caused a manic episode) I noticed pretty quickly and sought help. What test should I try today?
Obviously “everyone disagrees with me and I can’t convince most people” is a bad sign. But after long and patient effort I have convinced a number of unfortunates in my circle of friends. Some of whom have always seemed pretty sharp to me.
And you must admit, the field as a whole seems to be coming round to my point of view!
Rambling I do not take as a compliment. But nevertheless I thank you for the feedback.
I thought I’d written the original post pretty clearly and succinctly. If not, advice on how to write more clearly is always welcome. If you get my argument, can you steelman it?
Your guess is correct, I literally haven’t shifted my position on all this since 2010. Except to notice that everything’s happening much faster than I expected it to. Thirteen years ago I expected this to kill our children. Now I worry that it’s going to kill my parents. AlphaZero was the fire alarm for me. General Game Playing was one of the more important sub-problems.
I agree that if you haven’t changed your mind for thirteen years in a field that’s moving fast, you’re probably stuck.
I think my basic intuitions are:
“It’s a terrible idea to create a really strong mind that doesn’t like you.”
“Really strong minds are physically possible, humans are nowhere near.”
“Human-level AI is easy because evolution did it to us, quickly, and evolution is stupid.”
“Recursive self-improvement is possible.”
Which of these four things do you disagree with? Or do you think the four together are insufficient?
I get that your argument is essentially as follows:
1.) Solving the problem of what values to put into an ai, even given the other technical issues being solved, is impossibly difficult in real life.
2.) To prove the problem’s impossible difficulty, here’s a much kinder version of reality where the problem still remains impossible.
I don’t think you did 2, and it requires me to already accept 1 is true, which I think it probably isn’t, and I think that most would agree with me on this point, at least in principle.
I don’t disagree with any of them. I doubt there’s a convincing argument that could get me to disagree with any of those as presented.
What I am not convinced of, is that given all those assumptions being true, certain doom necessarily follows, or that there is no possible humanly tractable scheme which avoids doom in whatever time we have left.
I’m not clever enough to figure out what the solution is mind you, nor am I especially confident that someone else is necessarily going to. Please don’t confuse me for someone who doesn’t often worry about these things.
OK, cool, I mean “just not building the AI” is a good way to avoid doom, and that still seems at least possible, so we’re maybe on the same page there.
And I think you got what I was trying to say, solving 1 and/or 2 can’t be done iteratively or by patching together a huge list of desiderata. We have to solve philosophy somehow, without superintelligent help. As I say, that looks like the harder part to me.
I promise I’ll try not to!
None. But if a problem’s not solvable in an easy case, it’s not solvable in a harder case.
Same argument as for thinking about Solomonoff Induction or Halting Oracles. If you can’t even do it with magic powers, that tells you something about what you can really do.