Those really don’t look too bad to me! (It’s 2022). We’re all starting to think AI transcendence is ‘within the decade’, even though no-one’s trying to do it deliberately any more.
And nanowar before 2015? Well we just saw (2019) an accidental release of a probably-engineered virus. How far away can a deliberate release be?
At the time Eliezer was still very optimistic, but I thought that things would take longer than he thought, but also that the AI alignment project was hopeless. As I remember I thought that AI was unlikely to kill me personally, but very likely to kill my friends’ children.
Updating after ten years, I was less wrong about the hopelessness, and he was less wrong about the timelines.
A stretch, agreed, but ‘deliberately released tiny self-replicating thing engineered to kill’ sure would sound like nanowar, so we’re short intent, rather than capability.
I’d be amazed if the full-strength horrors weren’t sitting ready in shady military labs around the world. In fact if there aren’t any in Porton Down then what the hell have they been doing with my taxes?
PS. I enjoyed the main article here very much indeed. Well done.
I read the blogspot post, and in the comments you said that even if every mind on the planet were to work on the problem we would still have almost no chance. Unless you know something nobody else does, this seems, and please forgive my bluntness, batshit crazy.
I understand the argument about how accurately judging difficulty is something that’s usually only doable when you’re already in a place where you can kind of see a solution, even if this argument doesn’t remotely register to me as the total knockdown I intuit you think it is. Even if I did totally agree that it was as bad a sign as you believe it is, I still don’t see how it warrants that level of pessimism.
You’ve claimed before that your hopelessness when it comes to alignment is based only on a strong intuition, and that you don’t believe you “know” anything that others don’t. I find this claim to be increasingly hard to believe given the near-total confidence in your continuous predictions about our odds.
Maybe asking you if you think you know something others don’t about alignment is a wrong question, so instead I’ll make a (hopefully) different attempt and ask the following; Do you believe that you are grasping something that seems objective to you on an intellectual and/or conceptual level that others (“others” being people doing research that is at least remotely relevant to alignment or knowledgeable people in the EA/LW/Rat-and-rat-adjacent communities who are more optimistic than you) are failing to grasp, and therefore not availing them the “truth” that alignment is so inhumanly difficult? (If so, but you worry that your answer might result in in you stepping on someone else’s toes, then feel free to message me your honest answer in dms. You can ignore this offer if you have no such worries).
If no, then I find it extremely difficult to sympathize with your efforts elsewhere to sell the idea to people that we are so hopelessly, irrevocably, inevitably doomed that the most “dignified” thing to do would be to spend our remaining lives enjoying the “sunshine” rather than actually trying to do anything about the problem.
Your behavior surprises me further when I realize that this is something even Yudkowsky, one of the most pessimistic people in the community, explicitly advises not to do in his “Death With Dignity” post, which seems to suggest, IMO, that your position is even more doomer-ific. Again, this honestly seems crazy to me when proclamations to that effect are coming from someone who claims that they don’t know anything. (Disclaimer: No, I’m not saying that you need to have some special knowledge/permission to be allowed to disagree with Yudkowsky about things without seeming crazy, nor could I plausibly see myself believing such a thing.)
I’d like to try and dissuade anyone from conceiving any notions that I think that I am privy to any information that makes doom or not-doom look inescapable, nor do I think that I’m grasping something that pessimistic people aren’t also grasping that avails me the “truth”. I don’t know much aside from the more basic arguments, and I’m not especially optimistic or pessimistic about our chances, just admittedly highly uncertain.
Do you believe that you are grasping something that seems objective to you on an intellectual and/or conceptual level that others (“others” being people doing research that is at least remotely relevant to alignment or knowledgeable people in the EA/LW/Rat-and-rat-adjacent communities who are more optimistic than you) are failing to grasp, and therefore not availing them the “truth” that alignment is so inhumanly difficult?
Yes I think so, It seems to me that ‘saying what the good is’ has been a two-thousand year philosophical project on which we’ve made very little progress. Getting that defined formally, within the next few years, to the point where I might be able to write a computer program to tell me which possible outcomes are good just looks like an impossible task.
E.g. We all think that whether a being is conscious makes some moral difference. But we aren’t even close to being able to tell whether a being is conscious in that sense. I’ve never heard anyone give a sensible description of what the ‘hard problem’ even is. That’s one of the hard things about it.
And our formal definition of ‘the good’ needs to be correct. A few weird edge cases failing under heavy optimization pressure just leads to a paperclipper with weird paperclips that are some parody of what we might actually have wanted.
For all I know, a universe full of computronium having one vast orgasm really is the highest good. But that seems to be an outcome that we don’t want. Who can say why?
Eliezer himself explained how hopelessly complex and incoherent human values are.
Probably we’d need superhuman help to work out some sort of Coherent Extrapolated Volition, (even assuming that makes any sense at all). But creating superhuman help seems to kill us all.
MIRI spent the last ten years or so pursuing the sorts of mathematically rigorous approaches that might, eventually, after a few decades of top-class mathematical effort, solve the easy bit of the problem: ‘given a utility function, make it so’. And as far as I know they discovered that it was all quite a lot harder than it looked. And mathematically rigorous attacks seem to be not the sort of thing that current AI methods are amenable to anyway.
No one’s attacking ‘what should that utility function look like?’.
My main worry for the future is that people trying to build aligned AIs will succeed just well enough to create something that’s worse than just destroying everything. But I do think that even that is quite beyond us.
Whereas building a superintelligence out of random bits of crap that will just set off and do random things really well seems to be well within our current powers, and a very lot of people are hell-bent on doing just that and it will be here soon.
So the situation seems to me a bit like ‘some homeless lunatic in Hiroshima trying to build a bomb-proof umbrella vs. the Manhattan project’.
Seriously that’s all I’ve got. On the side of doom, a buggerload of brilliant, motivated people working on a very tractable looking problem. On the side of continued human existence, some guys, no plan, no progress, and the problem looks impossible.
I name the political movement that I cannot see any reason to start: “Ineffective Doomerism”. If there’s a positive singularity, (and quantum suicide makes me think I might see one!) yall have my permission to laugh at me for the rest of time.
So, and please correct me if I’m wrong, would you say that the main source of your hopelessness comes from the idea of human values being too complex to correctly program into anything? Because I basically agree with that idea, but it doesn’t really inspire much doomerism in me. I already always believed that trying to “solve” ethics was pretty much futile before I got introduced to LW, but I never gave that much weight in terms how much it affects alignment due to the following reason:
I just don’t expect that any of the clever people who I tend to defer to are actually trying to do exactly this; “this” being trying to actually, literally reverse-engineer human values and then encode them.
The idea seems obviously wrong enough that I honestly don’t believe that anyone working in the alignment field who thinks that the problem is solvable from at least a technical standpoint (Paul Christiano, Richard Ngo, Nate Soares, etc.) haven’t already considered this.
However, our conversation here has inspired me to ask a question regarding this in the latest monthly AGI safety questions thread.
That was kind of a long-term source of hopelessness; why I thought Eliezer’s plan wouldn’t work out without having a very long time and lots of people working on it, but my current source of short-term hopelessness is that it looks like we’re right on the verge of achieving AGI, and no-one seems to be taking the danger remotely seriously.
It’s like being in a petrol warehouse with a load of monkeys striking matches. We just die by default now, unless something really drastic and surprising happens.
Well, we can agree that the default outcome is probably death.
So, in my previous comment, I explained why I tend to not think Complexity of Value necessarily dooms us. I doubt you find the aforementioned reasoning remotely reassuring, but I’d be interested in finding out why you think that it shouldn’t be. Would you be willing to try and explain that to me?
I never meant to claim that my position was “clever people don’t seem worried so I shouldn’t be”. If that’s what you got from me, then that’s my mistake. I’m incredibly worried as a matter of fact, and much more importantly, everyone I mentioned also is to some extent or another, as you already pointed out. What I meant to say but failed to was that there’s enough disagreement in these circles that near-absolute confidence in doom seems to be jumping the gun. That argument also very much holds against people who are so certain that everything will go just fine.
I guess most of my disagreement comes from 4. Or rather, the implication that having an exact formal specification of human values ready to be encoded is necessarily the only way that things could possibly go well. I already tried to verbalize as much earlier, but maybe I didn’t do a good job of that either.
I wouldn’t call my confidence in doom near-absolute, so much as “very high”! I would have been just as much a doomer in 1950, last time AI looked imminent, before it was realized that “the hard things are easy and the easy things are hard”.
I wouldn’t be that surprised if it turned out that we’re still a few fundamental discoveries away from AGI. My intuition is telling me that we’re not.
But the feeling that we might get away with it is only coming from a sense that I can easily be wrong about stuff. I would feel the same if I’d been transported back to 1600, made myself a telescope, and observed a comet heading for earth, but no-one would listen.
“Within my model”, as it were, yes, near-absolute is a fair description.
The long-term problem is that an agent is going to have a goal. And most goals kill us. We get to make exactly one wish, and that wish will come true whether we want it or not. Even if the world was sane, this would be a very very dangerous situation. I would want to see very strong mathematical proof that such a thing was safe before trying it, and I’d still expect it to kill everyone.
The short term problem is that we’re not even trying. People all over the place are actively building more and more general agents that make plans, with just any old goals, without apparently worrying about it, and they don’t believe there’s a problem.
What on earth do you think might stop the apocalypse? I can imagine something like “take over the world, destroy all computers” might work, but that doesn’t look feasible without superintelligent help, and that puts us in the situation where we have a rough idea what we want, but we still need to find out how to express that formally without it leading to the destruction of all things.
As a very wise man once said: “The only genie to which it is safe to make a wish is one to which you don’t need to make a wish, because it already knows what you want and it is on your side.”
Those really don’t look too bad to me! (It’s 2022). We’re all starting to think AI transcendence is ‘within the decade’, even though no-one’s trying to do it deliberately any more.
And nanowar before 2015? Well we just saw (2019) an accidental release of a probably-engineered virus. How far away can a deliberate release be?
Not bad for 1999.
In 2010, I wrote: https://johnlawrenceaspden.blogspot.com/2010/12/all-dead-soon.html
At the time Eliezer was still very optimistic, but I thought that things would take longer than he thought, but also that the AI alignment project was hopeless. As I remember I thought that AI was unlikely to kill me personally, but very likely to kill my friends’ children.
Updating after ten years, I was less wrong about the hopelessness, and he was less wrong about the timelines.
I think it’s way too much of a stretch to say that gain-of-function-virus lab escape is “nanowar.”
A stretch, agreed, but ‘deliberately released tiny self-replicating thing engineered to kill’ sure would sound like nanowar, so we’re short intent, rather than capability.
I’d be amazed if the full-strength horrors weren’t sitting ready in shady military labs around the world. In fact if there aren’t any in Porton Down then what the hell have they been doing with my taxes?
PS. I enjoyed the main article here very much indeed. Well done.
I read the blogspot post, and in the comments you said that even if every mind on the planet were to work on the problem we would still have almost no chance. Unless you know something nobody else does, this seems, and please forgive my bluntness, batshit crazy.
I understand the argument about how accurately judging difficulty is something that’s usually only doable when you’re already in a place where you can kind of see a solution, even if this argument doesn’t remotely register to me as the total knockdown I intuit you think it is. Even if I did totally agree that it was as bad a sign as you believe it is, I still don’t see how it warrants that level of pessimism.
You’ve claimed before that your hopelessness when it comes to alignment is based only on a strong intuition, and that you don’t believe you “know” anything that others don’t. I find this claim to be increasingly hard to believe given the near-total confidence in your continuous predictions about our odds.
Maybe asking you if you think you know something others don’t about alignment is a wrong question, so instead I’ll make a (hopefully) different attempt and ask the following; Do you believe that you are grasping something that seems objective to you on an intellectual and/or conceptual level that others (“others” being people doing research that is at least remotely relevant to alignment or knowledgeable people in the EA/LW/Rat-and-rat-adjacent communities who are more optimistic than you) are failing to grasp, and therefore not availing them the “truth” that alignment is so inhumanly difficult? (If so, but you worry that your answer might result in in you stepping on someone else’s toes, then feel free to message me your honest answer in dms. You can ignore this offer if you have no such worries).
If no, then I find it extremely difficult to sympathize with your efforts elsewhere to sell the idea to people that we are so hopelessly, irrevocably, inevitably doomed that the most “dignified” thing to do would be to spend our remaining lives enjoying the “sunshine” rather than actually trying to do anything about the problem.
Your behavior surprises me further when I realize that this is something even Yudkowsky, one of the most pessimistic people in the community, explicitly advises not to do in his “Death With Dignity” post, which seems to suggest, IMO, that your position is even more doomer-ific. Again, this honestly seems crazy to me when proclamations to that effect are coming from someone who claims that they don’t know anything. (Disclaimer: No, I’m not saying that you need to have some special knowledge/permission to be allowed to disagree with Yudkowsky about things without seeming crazy, nor could I plausibly see myself believing such a thing.)
I’d like to try and dissuade anyone from conceiving any notions that I think that I am privy to any information that makes doom or not-doom look inescapable, nor do I think that I’m grasping something that pessimistic people aren’t also grasping that avails me the “truth”. I don’t know much aside from the more basic arguments, and I’m not especially optimistic or pessimistic about our chances, just admittedly highly uncertain.
Yes I think so, It seems to me that ‘saying what the good is’ has been a two-thousand year philosophical project on which we’ve made very little progress. Getting that defined formally, within the next few years, to the point where I might be able to write a computer program to tell me which possible outcomes are good just looks like an impossible task.
E.g. We all think that whether a being is conscious makes some moral difference. But we aren’t even close to being able to tell whether a being is conscious in that sense. I’ve never heard anyone give a sensible description of what the ‘hard problem’ even is. That’s one of the hard things about it.
And our formal definition of ‘the good’ needs to be correct. A few weird edge cases failing under heavy optimization pressure just leads to a paperclipper with weird paperclips that are some parody of what we might actually have wanted.
For all I know, a universe full of computronium having one vast orgasm really is the highest good. But that seems to be an outcome that we don’t want. Who can say why?
Eliezer himself explained how hopelessly complex and incoherent human values are.
Probably we’d need superhuman help to work out some sort of Coherent Extrapolated Volition, (even assuming that makes any sense at all). But creating superhuman help seems to kill us all.
MIRI spent the last ten years or so pursuing the sorts of mathematically rigorous approaches that might, eventually, after a few decades of top-class mathematical effort, solve the easy bit of the problem: ‘given a utility function, make it so’. And as far as I know they discovered that it was all quite a lot harder than it looked. And mathematically rigorous attacks seem to be not the sort of thing that current AI methods are amenable to anyway.
No one’s attacking ‘what should that utility function look like?’.
My main worry for the future is that people trying to build aligned AIs will succeed just well enough to create something that’s worse than just destroying everything. But I do think that even that is quite beyond us.
Whereas building a superintelligence out of random bits of crap that will just set off and do random things really well seems to be well within our current powers, and a very lot of people are hell-bent on doing just that and it will be here soon.
So the situation seems to me a bit like ‘some homeless lunatic in Hiroshima trying to build a bomb-proof umbrella vs. the Manhattan project’.
Seriously that’s all I’ve got. On the side of doom, a buggerload of brilliant, motivated people working on a very tractable looking problem. On the side of continued human existence, some guys, no plan, no progress, and the problem looks impossible.
I name the political movement that I cannot see any reason to start: “Ineffective Doomerism”. If there’s a positive singularity, (and quantum suicide makes me think I might see one!) yall have my permission to laugh at me for the rest of time.
So, and please correct me if I’m wrong, would you say that the main source of your hopelessness comes from the idea of human values being too complex to correctly program into anything? Because I basically agree with that idea, but it doesn’t really inspire much doomerism in me. I already always believed that trying to “solve” ethics was pretty much futile before I got introduced to LW, but I never gave that much weight in terms how much it affects alignment due to the following reason:
I just don’t expect that any of the clever people who I tend to defer to are actually trying to do exactly this; “this” being trying to actually, literally reverse-engineer human values and then encode them.
The idea seems obviously wrong enough that I honestly don’t believe that anyone working in the alignment field who thinks that the problem is solvable from at least a technical standpoint (Paul Christiano, Richard Ngo, Nate Soares, etc.) haven’t already considered this.
However, our conversation here has inspired me to ask a question regarding this in the latest monthly AGI safety questions thread.
That was kind of a long-term source of hopelessness; why I thought Eliezer’s plan wouldn’t work out without having a very long time and lots of people working on it, but my current source of short-term hopelessness is that it looks like we’re right on the verge of achieving AGI, and no-one seems to be taking the danger remotely seriously.
It’s like being in a petrol warehouse with a load of monkeys striking matches. We just die by default now, unless something really drastic and surprising happens.
Well, we can agree that the default outcome is probably death.
So, in my previous comment, I explained why I tend to not think Complexity of Value necessarily dooms us. I doubt you find the aforementioned reasoning remotely reassuring, but I’d be interested in finding out why you think that it shouldn’t be. Would you be willing to try and explain that to me?
Hi, so I don’t understand why you’re not worried except that “some clever people don’t seem worried”.
But actually I think all those guys are in fact quite worried. If they aren’t full on doomers then I don’t understand what they’re hoping to do.
So I’ll repeat my argument:
(1) We’re about to create a superintelligence. This is close and there’s no way to stop it.
(2) If we create a superintelligence, then whatever it wants is what is going to happen.
(3) If that’s not what we want, that’s very bad.
(4) We have no idea what we want, not even roughly, let alone in the sense of formal specification.
That’s pretty much it. Which bit do you disagree with?
I never meant to claim that my position was “clever people don’t seem worried so I shouldn’t be”. If that’s what you got from me, then that’s my mistake. I’m incredibly worried as a matter of fact, and much more importantly, everyone I mentioned also is to some extent or another, as you already pointed out. What I meant to say but failed to was that there’s enough disagreement in these circles that near-absolute confidence in doom seems to be jumping the gun. That argument also very much holds against people who are so certain that everything will go just fine.
I guess most of my disagreement comes from 4. Or rather, the implication that having an exact formal specification of human values ready to be encoded is necessarily the only way that things could possibly go well. I already tried to verbalize as much earlier, but maybe I didn’t do a good job of that either.
I wouldn’t call my confidence in doom near-absolute, so much as “very high”! I would have been just as much a doomer in 1950, last time AI looked imminent, before it was realized that “the hard things are easy and the easy things are hard”.
I wouldn’t be that surprised if it turned out that we’re still a few fundamental discoveries away from AGI. My intuition is telling me that we’re not.
But the feeling that we might get away with it is only coming from a sense that I can easily be wrong about stuff. I would feel the same if I’d been transported back to 1600, made myself a telescope, and observed a comet heading for earth, but no-one would listen.
“Within my model”, as it were, yes, near-absolute is a fair description.
The long-term problem is that an agent is going to have a goal. And most goals kill us. We get to make exactly one wish, and that wish will come true whether we want it or not. Even if the world was sane, this would be a very very dangerous situation. I would want to see very strong mathematical proof that such a thing was safe before trying it, and I’d still expect it to kill everyone.
The short term problem is that we’re not even trying. People all over the place are actively building more and more general agents that make plans, with just any old goals, without apparently worrying about it, and they don’t believe there’s a problem.
What on earth do you think might stop the apocalypse? I can imagine something like “take over the world, destroy all computers” might work, but that doesn’t look feasible without superintelligent help, and that puts us in the situation where we have a rough idea what we want, but we still need to find out how to express that formally without it leading to the destruction of all things.
As a very wise man once said: “The only genie to which it is safe to make a wish is one to which you don’t need to make a wish, because it already knows what you want and it is on your side.”