If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
You can educate them all you want about the dangers, they’ll still die. No solution is known. Doesn’t matter if a particular group is cautious enough to not press forwards (as does not at all presently seem to be the case, note), next group in line destroys the world.
You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason.
This picture is unfortunately accurate, due to how little dignity we’re dying with.
But if we were on course to die with more dignity than this, we’d still die. The recklessness is not the source of the problem. The problem is that cautious people do not know what to do to get an AI that doesn’t destroy the world, even if they want that; not because they’re “insufficiently educated” in some solution that is known elsewhere, but because there is no known plan in which to educate them.
If you knew this, you sure picked a strange straw way of phrasing it, to say that the danger was AGI created by “people who are not sufficiently educated”, as if any other kind of people could exist, or it was a problem that could be solved by education.
For what it’s worth, I interpreted Yitz’s words as having the subtext “and no one, at present, as sufficiently educated, because no good solution is known” and not the subtext “so it’s OK because all we have to do is educate people”.
(Also and unrelatedly: I don’t think it’s right to say “The recklessness is not the source of the problem”. It seems to me that the recklessness is a problem potentially sufficient to kill us all, and not knowing a solution to the alignment problem is a problem potentially sufficient to kill us all, and both of those problems are likely very hard to solve. Neither is the source of the problem; the problem has multiple sources all potentially sufficient to wipe us out.)
I fully agree with your last point, btw. If I remember correctly (could be misremembering though), EY has stated in the past that it doesn’t matter if you can convince everyone alignment is hard, but I don’t think that’s fully true. If you really can convince a sufficient number of people to take alignment seriously, and not be reckless, you can affect governance, and simply prevent (or at least delay) AGI from being built in the first place.
Delay it for a few years, sure. Maybe. If you magically convince our idiotic governments of a complex technical fact that doesn’t fit the prevailing political narratives.
But if there are some people who are convinced they have a magic alignment solution…
Someone is likely to run some sort of AI sooner or later. Unless some massive effort to restrict access to computers or something.
Well then, imagine a hypothetical in which the world succeeds at a massive effort to restrict access to compute. That would be a primarily social challenge, to convince the relatively few people at the top to take the risk seriously enough to do that, and then you’ve actually got a pretty permanent solution...
Is it primarily a social challenge? Humanity now relies relatively heavily on quick and easy communications, CAD[1], computer-aided data processing for e.g. mineral prospecting, etc, etc.
(One could argue that we got along without this in the early-to-mid 1900s, but at the same time we now have significantly more people. Ditto, it wasn’t exactly sustainable.)
Apologies for the strange phrasing, I’ll try to improve my writing skills in that area. I actually fully agree with you that [assuming even “slightly unaligned”[1] AGI will kill us], even highly educated people who put a match to kerosene will get burned. By using the words “sufficiently educated,” my intention was to denote that in some sense, there is no sufficiently educated person on this planet, at least not yet.
as if any other kind of people could exist, or it was a problem that could be solved by education.
Well, I think that this is a problem that can be solved with education, at least in theory. The only problem is that we have no teachers (or even a lesson plan), and the final is due tomorrow. Theoretically though, I don’t see any strong reason why we can’t find a way to either teach ourselves or cheat, if we get lucky and have the time. Outside of this (rather forced) metaphor, I wanted to imply my admittedly optimistic sense that there are plausible futures in which AI researchers exist who do have the answer to the alignment problem. Even in such a world, of course, people who don’t bother to learn the solution or act in haste could still end the world.
My sense is that you believe (at this point in time) that there is in all likelihood no such world where alignment is solved, even if we have another 50+ years before AGI. Please correct me if I’m wrong about that.
I do not (yet) understand the source of your pessimism about this in particular, more than anything else, to be honest. I think if you could convince me that all current or plausible short-term future alignment research is doomed to fail, then I’d be willing to go the rest of the way with you.
I assume that your reaction to that phrase will be something along the lines of “but there is no such thing as ‘slightly unaligned’!” I’m wording it that way because that stance doesn’t seem to be universally acknowledged even within the EA community, so it seems best to make an allowance for that possibility, since I’m aiming for a diverse audience.
I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you’re creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.
But I can’t think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.
Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it’s not clear if such AI systems will tell us much about how more powerful systems will behave. It’s this “single chance to transition from a safe to dangerous operating domain” part of the problem that is so uniquely difficult about AI alignment.
I did ask to be critiqued, so in some sense it’s a totally fair response, imo. At the same time, though, Eliezer’s response does feel rude, which is worthy of analysis, considering EY’s outsized impact on the community.[1] So why does Yudkowsky come across as being rude here?
My first thoughts upon reading his comment (when scanning for tone) is that it opens with what feels like an assumption of inferiority, with the sense of “here, let me grant you a small parcel of my wisdom so that you can see just how wrong you are,” rather than “let me share insight I have gathered on my quest towards truth which will convince you.” In other words, a destructive, rather than constructive tone. This isn’t really a bad thing in the context of honest criticism. However, if you happen to care about actually changing other’s minds, most people respond better to a constructive tone, so their brain doesn’t automatically enter “fight mode” as an immediate adversarial response. My guess is Eliezer only really cares about convincing people who are rational enough not to become reactionaries over an adversarial tone, but I personally believe that it’s worth tailoring public comments like this to be a bit more comfortable for the average reader. Being careful about that also makes a future PR disaster less likely (though still not impossible even if you’re perfect), since you’ll get fewer people who feel rejected by the community (which could cause trouble later). I hope this makes sense, and I don’t come across as too rude myself here. (If so, please let me know!)
In case Eliezer is still reading this thread, I want to emphasise that this is not meant as a personal attack, but as a critique of your writing in the specific context of your work as a community leader/role-model—despite my criticism, your Sequences deeply changed my ideology and hence my life, so I’m not too upset over your writing style!
I think Eliezer was rude here, and both you and the mods think that the benefits of the good parts of the comment outweigh the costs of the rudeness. That’s a reasonable opinion, but it doesn’t make Eliezer’s statement not rude, and I’m in general happy that both the rudeness and the usefulness are being entered into common knowledge.
FWIW, I think it’s more likely he’s just tired of how many half-baked threads there are each time he makes a new statement about AI. This is not a value judgement of this post. I genuinely read it as a “here’s why your post doesn’t respond to my ideas”.
Agreed, and since I wasn’t able able to present my ideas clearly enough for his interpretation of my words to not diverge from my intentions, his criticism is totally valid coming from that perspective. I’m sure EY is quite exhausted seeing so many poorly-thought-out criticisms of his work, but ultimately (and unfortunately), motivation and hidden context doesn’t matter much when it comes to how people will interpret you.
Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?
I fully agree that there is a big risk of both massive damage to human preferences, and even the extinction of all life, so AI Alignment work is highly valuable, but why is “unproductive destruction of the entire world” so certain?
I think Eliezer phrases these things as “if we do X, then everybody dies” rather than “if we do X, then with substantial probability everyone dies” because it’s shorter, it’s more vivid, and it doesn’t differ substantially in what we need to do (i.e., make X not happen, or break the link between X and everyone dying).
It’s possible that he also thinks that the probability is more like 99.99% than like 50% (e.g., because there are so many ways in which such a hypothetical AI might end up destroying approximately everything we value), but it doesn’t seem to me that the consequences of “if we continue on our present trajectory, then some time in the next 3-100 years something will emerge that will certainly destroy everything we care about” and “if we continue on our present trajectory, then some time in the next 3-100 years something will emerge that with 50% probability will destroy everything we care about” are very different.
Because in what way are humans anything other than an impedance toward maximizing its reward functions? At worst, they pose a risk of restricting its reward increase by changing the reward, changing its capabilities, or destroying it outright. At best, they are physically restraining easily applicable resources toward maximizing its goals. Humans are variable no more valuable than the redundant bits it casts aside on the path of maximum efficiency and reward, if not properly aligned.
As a minor token of how much you’re missing:
You can educate them all you want about the dangers, they’ll still die. No solution is known. Doesn’t matter if a particular group is cautious enough to not press forwards (as does not at all presently seem to be the case, note), next group in line destroys the world.
You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason.
This picture is unfortunately accurate, due to how little dignity we’re dying with.
But if we were on course to die with more dignity than this, we’d still die. The recklessness is not the source of the problem. The problem is that cautious people do not know what to do to get an AI that doesn’t destroy the world, even if they want that; not because they’re “insufficiently educated” in some solution that is known elsewhere, but because there is no known plan in which to educate them.
If you knew this, you sure picked a strange straw way of phrasing it, to say that the danger was AGI created by “people who are not sufficiently educated”, as if any other kind of people could exist, or it was a problem that could be solved by education.
For what it’s worth, I interpreted Yitz’s words as having the subtext “and no one, at present, as sufficiently educated, because no good solution is known” and not the subtext “so it’s OK because all we have to do is educate people”.
(Also and unrelatedly: I don’t think it’s right to say “The recklessness is not the source of the problem”. It seems to me that the recklessness is a problem potentially sufficient to kill us all, and not knowing a solution to the alignment problem is a problem potentially sufficient to kill us all, and both of those problems are likely very hard to solve. Neither is the source of the problem; the problem has multiple sources all potentially sufficient to wipe us out.)
Thanks for the charitable read :)
I fully agree with your last point, btw. If I remember correctly (could be misremembering though), EY has stated in the past that it doesn’t matter if you can convince everyone alignment is hard, but I don’t think that’s fully true. If you really can convince a sufficient number of people to take alignment seriously, and not be reckless, you can affect governance, and simply prevent (or at least delay) AGI from being built in the first place.
Delay it for a few years, sure. Maybe. If you magically convince our idiotic governments of a complex technical fact that doesn’t fit the prevailing political narratives.
But if there are some people who are convinced they have a magic alignment solution…
Someone is likely to run some sort of AI sooner or later. Unless some massive effort to restrict access to computers or something.
Well then, imagine a hypothetical in which the world succeeds at a massive effort to restrict access to compute. That would be a primarily social challenge, to convince the relatively few people at the top to take the risk seriously enough to do that, and then you’ve actually got a pretty permanent solution...
Is it primarily a social challenge? Humanity now relies relatively heavily on quick and easy communications, CAD[1], computer-aided data processing for e.g. mineral prospecting, etc, etc.
(One could argue that we got along without this in the early-to-mid 1900s, but at the same time we now have significantly more people. Ditto, it wasn’t exactly sustainable.)
Computer-aided design
Apologies for the strange phrasing, I’ll try to improve my writing skills in that area. I actually fully agree with you that [assuming even “slightly unaligned”[1] AGI will kill us], even highly educated people who put a match to kerosene will get burned. By using the words “sufficiently educated,” my intention was to denote that in some sense, there is no sufficiently educated person on this planet, at least not yet.
Well, I think that this is a problem that can be solved with education, at least in theory. The only problem is that we have no teachers (or even a lesson plan), and the final is due tomorrow. Theoretically though, I don’t see any strong reason why we can’t find a way to either teach ourselves or cheat, if we get lucky and have the time. Outside of this (rather forced) metaphor, I wanted to imply my admittedly optimistic sense that there are plausible futures in which AI researchers exist who do have the answer to the alignment problem. Even in such a world, of course, people who don’t bother to learn the solution or act in haste could still end the world.
My sense is that you believe (at this point in time) that there is in all likelihood no such world where alignment is solved, even if we have another 50+ years before AGI. Please correct me if I’m wrong about that.
I do not (yet) understand the source of your pessimism about this in particular, more than anything else, to be honest. I think if you could convince me that all current or plausible short-term future alignment research is doomed to fail, then I’d be willing to go the rest of the way with you.
I assume that your reaction to that phrase will be something along the lines of “but there is no such thing as ‘slightly unaligned’!” I’m wording it that way because that stance doesn’t seem to be universally acknowledged even within the EA community, so it seems best to make an allowance for that possibility, since I’m aiming for a diverse audience.
I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you’re creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.
But I can’t think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.
Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it’s not clear if such AI systems will tell us much about how more powerful systems will behave. It’s this “single chance to transition from a safe to dangerous operating domain” part of the problem that is so uniquely difficult about AI alignment.
This is quite a rude response
I did ask to be critiqued, so in some sense it’s a totally fair response, imo. At the same time, though, Eliezer’s response does feel rude, which is worthy of analysis, considering EY’s outsized impact on the community.[1] So why does Yudkowsky come across as being rude here?
My first thoughts upon reading his comment (when scanning for tone) is that it opens with what feels like an assumption of inferiority, with the sense of “here, let me grant you a small parcel of my wisdom so that you can see just how wrong you are,” rather than “let me share insight I have gathered on my quest towards truth which will convince you.” In other words, a destructive, rather than constructive tone. This isn’t really a bad thing in the context of honest criticism. However, if you happen to care about actually changing other’s minds, most people respond better to a constructive tone, so their brain doesn’t automatically enter “fight mode” as an immediate adversarial response. My guess is Eliezer only really cares about convincing people who are rational enough not to become reactionaries over an adversarial tone, but I personally believe that it’s worth tailoring public comments like this to be a bit more comfortable for the average reader. Being careful about that also makes a future PR disaster less likely (though still not impossible even if you’re perfect), since you’ll get fewer people who feel rejected by the community (which could cause trouble later). I hope this makes sense, and I don’t come across as too rude myself here. (If so, please let me know!)
In case Eliezer is still reading this thread, I want to emphasise that this is not meant as a personal attack, but as a critique of your writing in the specific context of your work as a community leader/role-model—despite my criticism, your Sequences deeply changed my ideology and hence my life, so I’m not too upset over your writing style!
I think Eliezer was rude here, and both you and the mods think that the benefits of the good parts of the comment outweigh the costs of the rudeness. That’s a reasonable opinion, but it doesn’t make Eliezer’s statement not rude, and I’m in general happy that both the rudeness and the usefulness are being entered into common knowledge.
FWIW, I think it’s more likely he’s just tired of how many half-baked threads there are each time he makes a new statement about AI. This is not a value judgement of this post. I genuinely read it as a “here’s why your post doesn’t respond to my ideas”.
Agreed, and since I wasn’t able able to present my ideas clearly enough for his interpretation of my words to not diverge from my intentions, his criticism is totally valid coming from that perspective. I’m sure EY is quite exhausted seeing so many poorly-thought-out criticisms of his work, but ultimately (and unfortunately), motivation and hidden context doesn’t matter much when it comes to how people will interpret you.
But true and important.
Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?
I fully agree that there is a big risk of both massive damage to human preferences, and even the extinction of all life, so AI Alignment work is highly valuable, but why is “unproductive destruction of the entire world” so certain?
I think Eliezer phrases these things as “if we do X, then everybody dies” rather than “if we do X, then with substantial probability everyone dies” because it’s shorter, it’s more vivid, and it doesn’t differ substantially in what we need to do (i.e., make X not happen, or break the link between X and everyone dying).
It’s possible that he also thinks that the probability is more like 99.99% than like 50% (e.g., because there are so many ways in which such a hypothetical AI might end up destroying approximately everything we value), but it doesn’t seem to me that the consequences of “if we continue on our present trajectory, then some time in the next 3-100 years something will emerge that will certainly destroy everything we care about” and “if we continue on our present trajectory, then some time in the next 3-100 years something will emerge that with 50% probability will destroy everything we care about” are very different.
Because in what way are humans anything other than an impedance toward maximizing its reward functions? At worst, they pose a risk of restricting its reward increase by changing the reward, changing its capabilities, or destroying it outright. At best, they are physically restraining easily applicable resources toward maximizing its goals. Humans are variable no more valuable than the redundant bits it casts aside on the path of maximum efficiency and reward, if not properly aligned.