Apologies for the strange phrasing, I’ll try to improve my writing skills in that area. I actually fully agree with you that [assuming even “slightly unaligned”[1] AGI will kill us], even highly educated people who put a match to kerosene will get burned. By using the words “sufficiently educated,” my intention was to denote that in some sense, there is no sufficiently educated person on this planet, at least not yet.
as if any other kind of people could exist, or it was a problem that could be solved by education.
Well, I think that this is a problem that can be solved with education, at least in theory. The only problem is that we have no teachers (or even a lesson plan), and the final is due tomorrow. Theoretically though, I don’t see any strong reason why we can’t find a way to either teach ourselves or cheat, if we get lucky and have the time. Outside of this (rather forced) metaphor, I wanted to imply my admittedly optimistic sense that there are plausible futures in which AI researchers exist who do have the answer to the alignment problem. Even in such a world, of course, people who don’t bother to learn the solution or act in haste could still end the world.
My sense is that you believe (at this point in time) that there is in all likelihood no such world where alignment is solved, even if we have another 50+ years before AGI. Please correct me if I’m wrong about that.
I do not (yet) understand the source of your pessimism about this in particular, more than anything else, to be honest. I think if you could convince me that all current or plausible short-term future alignment research is doomed to fail, then I’d be willing to go the rest of the way with you.
I assume that your reaction to that phrase will be something along the lines of “but there is no such thing as ‘slightly unaligned’!” I’m wording it that way because that stance doesn’t seem to be universally acknowledged even within the EA community, so it seems best to make an allowance for that possibility, since I’m aiming for a diverse audience.
I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you’re creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.
But I can’t think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.
Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it’s not clear if such AI systems will tell us much about how more powerful systems will behave. It’s this “single chance to transition from a safe to dangerous operating domain” part of the problem that is so uniquely difficult about AI alignment.
Apologies for the strange phrasing, I’ll try to improve my writing skills in that area. I actually fully agree with you that [assuming even “slightly unaligned”[1] AGI will kill us], even highly educated people who put a match to kerosene will get burned. By using the words “sufficiently educated,” my intention was to denote that in some sense, there is no sufficiently educated person on this planet, at least not yet.
Well, I think that this is a problem that can be solved with education, at least in theory. The only problem is that we have no teachers (or even a lesson plan), and the final is due tomorrow. Theoretically though, I don’t see any strong reason why we can’t find a way to either teach ourselves or cheat, if we get lucky and have the time. Outside of this (rather forced) metaphor, I wanted to imply my admittedly optimistic sense that there are plausible futures in which AI researchers exist who do have the answer to the alignment problem. Even in such a world, of course, people who don’t bother to learn the solution or act in haste could still end the world.
My sense is that you believe (at this point in time) that there is in all likelihood no such world where alignment is solved, even if we have another 50+ years before AGI. Please correct me if I’m wrong about that.
I do not (yet) understand the source of your pessimism about this in particular, more than anything else, to be honest. I think if you could convince me that all current or plausible short-term future alignment research is doomed to fail, then I’d be willing to go the rest of the way with you.
I assume that your reaction to that phrase will be something along the lines of “but there is no such thing as ‘slightly unaligned’!” I’m wording it that way because that stance doesn’t seem to be universally acknowledged even within the EA community, so it seems best to make an allowance for that possibility, since I’m aiming for a diverse audience.
I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you’re creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.
But I can’t think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.
Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it’s not clear if such AI systems will tell us much about how more powerful systems will behave. It’s this “single chance to transition from a safe to dangerous operating domain” part of the problem that is so uniquely difficult about AI alignment.