Do you believe that you are grasping something that seems objective to you on an intellectual and/or conceptual level that others (“others” being people doing research that is at least remotely relevant to alignment or knowledgeable people in the EA/LW/Rat-and-rat-adjacent communities who are more optimistic than you) are failing to grasp, and therefore not availing them the “truth” that alignment is so inhumanly difficult?
Yes I think so, It seems to me that ‘saying what the good is’ has been a two-thousand year philosophical project on which we’ve made very little progress. Getting that defined formally, within the next few years, to the point where I might be able to write a computer program to tell me which possible outcomes are good just looks like an impossible task.
E.g. We all think that whether a being is conscious makes some moral difference. But we aren’t even close to being able to tell whether a being is conscious in that sense. I’ve never heard anyone give a sensible description of what the ‘hard problem’ even is. That’s one of the hard things about it.
And our formal definition of ‘the good’ needs to be correct. A few weird edge cases failing under heavy optimization pressure just leads to a paperclipper with weird paperclips that are some parody of what we might actually have wanted.
For all I know, a universe full of computronium having one vast orgasm really is the highest good. But that seems to be an outcome that we don’t want. Who can say why?
Eliezer himself explained how hopelessly complex and incoherent human values are.
Probably we’d need superhuman help to work out some sort of Coherent Extrapolated Volition, (even assuming that makes any sense at all). But creating superhuman help seems to kill us all.
MIRI spent the last ten years or so pursuing the sorts of mathematically rigorous approaches that might, eventually, after a few decades of top-class mathematical effort, solve the easy bit of the problem: ‘given a utility function, make it so’. And as far as I know they discovered that it was all quite a lot harder than it looked. And mathematically rigorous attacks seem to be not the sort of thing that current AI methods are amenable to anyway.
No one’s attacking ‘what should that utility function look like?’.
My main worry for the future is that people trying to build aligned AIs will succeed just well enough to create something that’s worse than just destroying everything. But I do think that even that is quite beyond us.
Whereas building a superintelligence out of random bits of crap that will just set off and do random things really well seems to be well within our current powers, and a very lot of people are hell-bent on doing just that and it will be here soon.
So the situation seems to me a bit like ‘some homeless lunatic in Hiroshima trying to build a bomb-proof umbrella vs. the Manhattan project’.
Seriously that’s all I’ve got. On the side of doom, a buggerload of brilliant, motivated people working on a very tractable looking problem. On the side of continued human existence, some guys, no plan, no progress, and the problem looks impossible.
I name the political movement that I cannot see any reason to start: “Ineffective Doomerism”. If there’s a positive singularity, (and quantum suicide makes me think I might see one!) yall have my permission to laugh at me for the rest of time.
So, and please correct me if I’m wrong, would you say that the main source of your hopelessness comes from the idea of human values being too complex to correctly program into anything? Because I basically agree with that idea, but it doesn’t really inspire much doomerism in me. I already always believed that trying to “solve” ethics was pretty much futile before I got introduced to LW, but I never gave that much weight in terms how much it affects alignment due to the following reason:
I just don’t expect that any of the clever people who I tend to defer to are actually trying to do exactly this; “this” being trying to actually, literally reverse-engineer human values and then encode them.
The idea seems obviously wrong enough that I honestly don’t believe that anyone working in the alignment field who thinks that the problem is solvable from at least a technical standpoint (Paul Christiano, Richard Ngo, Nate Soares, etc.) haven’t already considered this.
However, our conversation here has inspired me to ask a question regarding this in the latest monthly AGI safety questions thread.
That was kind of a long-term source of hopelessness; why I thought Eliezer’s plan wouldn’t work out without having a very long time and lots of people working on it, but my current source of short-term hopelessness is that it looks like we’re right on the verge of achieving AGI, and no-one seems to be taking the danger remotely seriously.
It’s like being in a petrol warehouse with a load of monkeys striking matches. We just die by default now, unless something really drastic and surprising happens.
Well, we can agree that the default outcome is probably death.
So, in my previous comment, I explained why I tend to not think Complexity of Value necessarily dooms us. I doubt you find the aforementioned reasoning remotely reassuring, but I’d be interested in finding out why you think that it shouldn’t be. Would you be willing to try and explain that to me?
I never meant to claim that my position was “clever people don’t seem worried so I shouldn’t be”. If that’s what you got from me, then that’s my mistake. I’m incredibly worried as a matter of fact, and much more importantly, everyone I mentioned also is to some extent or another, as you already pointed out. What I meant to say but failed to was that there’s enough disagreement in these circles that near-absolute confidence in doom seems to be jumping the gun. That argument also very much holds against people who are so certain that everything will go just fine.
I guess most of my disagreement comes from 4. Or rather, the implication that having an exact formal specification of human values ready to be encoded is necessarily the only way that things could possibly go well. I already tried to verbalize as much earlier, but maybe I didn’t do a good job of that either.
I wouldn’t call my confidence in doom near-absolute, so much as “very high”! I would have been just as much a doomer in 1950, last time AI looked imminent, before it was realized that “the hard things are easy and the easy things are hard”.
I wouldn’t be that surprised if it turned out that we’re still a few fundamental discoveries away from AGI. My intuition is telling me that we’re not.
But the feeling that we might get away with it is only coming from a sense that I can easily be wrong about stuff. I would feel the same if I’d been transported back to 1600, made myself a telescope, and observed a comet heading for earth, but no-one would listen.
“Within my model”, as it were, yes, near-absolute is a fair description.
The long-term problem is that an agent is going to have a goal. And most goals kill us. We get to make exactly one wish, and that wish will come true whether we want it or not. Even if the world was sane, this would be a very very dangerous situation. I would want to see very strong mathematical proof that such a thing was safe before trying it, and I’d still expect it to kill everyone.
The short term problem is that we’re not even trying. People all over the place are actively building more and more general agents that make plans, with just any old goals, without apparently worrying about it, and they don’t believe there’s a problem.
What on earth do you think might stop the apocalypse? I can imagine something like “take over the world, destroy all computers” might work, but that doesn’t look feasible without superintelligent help, and that puts us in the situation where we have a rough idea what we want, but we still need to find out how to express that formally without it leading to the destruction of all things.
As a very wise man once said: “The only genie to which it is safe to make a wish is one to which you don’t need to make a wish, because it already knows what you want and it is on your side.”
Yes I think so, It seems to me that ‘saying what the good is’ has been a two-thousand year philosophical project on which we’ve made very little progress. Getting that defined formally, within the next few years, to the point where I might be able to write a computer program to tell me which possible outcomes are good just looks like an impossible task.
E.g. We all think that whether a being is conscious makes some moral difference. But we aren’t even close to being able to tell whether a being is conscious in that sense. I’ve never heard anyone give a sensible description of what the ‘hard problem’ even is. That’s one of the hard things about it.
And our formal definition of ‘the good’ needs to be correct. A few weird edge cases failing under heavy optimization pressure just leads to a paperclipper with weird paperclips that are some parody of what we might actually have wanted.
For all I know, a universe full of computronium having one vast orgasm really is the highest good. But that seems to be an outcome that we don’t want. Who can say why?
Eliezer himself explained how hopelessly complex and incoherent human values are.
Probably we’d need superhuman help to work out some sort of Coherent Extrapolated Volition, (even assuming that makes any sense at all). But creating superhuman help seems to kill us all.
MIRI spent the last ten years or so pursuing the sorts of mathematically rigorous approaches that might, eventually, after a few decades of top-class mathematical effort, solve the easy bit of the problem: ‘given a utility function, make it so’. And as far as I know they discovered that it was all quite a lot harder than it looked. And mathematically rigorous attacks seem to be not the sort of thing that current AI methods are amenable to anyway.
No one’s attacking ‘what should that utility function look like?’.
My main worry for the future is that people trying to build aligned AIs will succeed just well enough to create something that’s worse than just destroying everything. But I do think that even that is quite beyond us.
Whereas building a superintelligence out of random bits of crap that will just set off and do random things really well seems to be well within our current powers, and a very lot of people are hell-bent on doing just that and it will be here soon.
So the situation seems to me a bit like ‘some homeless lunatic in Hiroshima trying to build a bomb-proof umbrella vs. the Manhattan project’.
Seriously that’s all I’ve got. On the side of doom, a buggerload of brilliant, motivated people working on a very tractable looking problem. On the side of continued human existence, some guys, no plan, no progress, and the problem looks impossible.
I name the political movement that I cannot see any reason to start: “Ineffective Doomerism”. If there’s a positive singularity, (and quantum suicide makes me think I might see one!) yall have my permission to laugh at me for the rest of time.
So, and please correct me if I’m wrong, would you say that the main source of your hopelessness comes from the idea of human values being too complex to correctly program into anything? Because I basically agree with that idea, but it doesn’t really inspire much doomerism in me. I already always believed that trying to “solve” ethics was pretty much futile before I got introduced to LW, but I never gave that much weight in terms how much it affects alignment due to the following reason:
I just don’t expect that any of the clever people who I tend to defer to are actually trying to do exactly this; “this” being trying to actually, literally reverse-engineer human values and then encode them.
The idea seems obviously wrong enough that I honestly don’t believe that anyone working in the alignment field who thinks that the problem is solvable from at least a technical standpoint (Paul Christiano, Richard Ngo, Nate Soares, etc.) haven’t already considered this.
However, our conversation here has inspired me to ask a question regarding this in the latest monthly AGI safety questions thread.
That was kind of a long-term source of hopelessness; why I thought Eliezer’s plan wouldn’t work out without having a very long time and lots of people working on it, but my current source of short-term hopelessness is that it looks like we’re right on the verge of achieving AGI, and no-one seems to be taking the danger remotely seriously.
It’s like being in a petrol warehouse with a load of monkeys striking matches. We just die by default now, unless something really drastic and surprising happens.
Well, we can agree that the default outcome is probably death.
So, in my previous comment, I explained why I tend to not think Complexity of Value necessarily dooms us. I doubt you find the aforementioned reasoning remotely reassuring, but I’d be interested in finding out why you think that it shouldn’t be. Would you be willing to try and explain that to me?
Hi, so I don’t understand why you’re not worried except that “some clever people don’t seem worried”.
But actually I think all those guys are in fact quite worried. If they aren’t full on doomers then I don’t understand what they’re hoping to do.
So I’ll repeat my argument:
(1) We’re about to create a superintelligence. This is close and there’s no way to stop it.
(2) If we create a superintelligence, then whatever it wants is what is going to happen.
(3) If that’s not what we want, that’s very bad.
(4) We have no idea what we want, not even roughly, let alone in the sense of formal specification.
That’s pretty much it. Which bit do you disagree with?
I never meant to claim that my position was “clever people don’t seem worried so I shouldn’t be”. If that’s what you got from me, then that’s my mistake. I’m incredibly worried as a matter of fact, and much more importantly, everyone I mentioned also is to some extent or another, as you already pointed out. What I meant to say but failed to was that there’s enough disagreement in these circles that near-absolute confidence in doom seems to be jumping the gun. That argument also very much holds against people who are so certain that everything will go just fine.
I guess most of my disagreement comes from 4. Or rather, the implication that having an exact formal specification of human values ready to be encoded is necessarily the only way that things could possibly go well. I already tried to verbalize as much earlier, but maybe I didn’t do a good job of that either.
I wouldn’t call my confidence in doom near-absolute, so much as “very high”! I would have been just as much a doomer in 1950, last time AI looked imminent, before it was realized that “the hard things are easy and the easy things are hard”.
I wouldn’t be that surprised if it turned out that we’re still a few fundamental discoveries away from AGI. My intuition is telling me that we’re not.
But the feeling that we might get away with it is only coming from a sense that I can easily be wrong about stuff. I would feel the same if I’d been transported back to 1600, made myself a telescope, and observed a comet heading for earth, but no-one would listen.
“Within my model”, as it were, yes, near-absolute is a fair description.
The long-term problem is that an agent is going to have a goal. And most goals kill us. We get to make exactly one wish, and that wish will come true whether we want it or not. Even if the world was sane, this would be a very very dangerous situation. I would want to see very strong mathematical proof that such a thing was safe before trying it, and I’d still expect it to kill everyone.
The short term problem is that we’re not even trying. People all over the place are actively building more and more general agents that make plans, with just any old goals, without apparently worrying about it, and they don’t believe there’s a problem.
What on earth do you think might stop the apocalypse? I can imagine something like “take over the world, destroy all computers” might work, but that doesn’t look feasible without superintelligent help, and that puts us in the situation where we have a rough idea what we want, but we still need to find out how to express that formally without it leading to the destruction of all things.
As a very wise man once said: “The only genie to which it is safe to make a wish is one to which you don’t need to make a wish, because it already knows what you want and it is on your side.”