I don’t want to make claims about what desires in this category are wise or unwise for a human; I make no pretense to wisdom :)
But it’s necessary for getting good outcomes out of a superintelligence!
I’ve heard good things about occasionally using tobacco to help focus (like how I already use coffee), but I’m terrified to touch it because I’m concerned I’ll get addicted. Bad demon!
Makes sense. I think I have a somewhat better idea of how you see the demon thing now.
I disagree with bad demon here. I’ve used nicotine for that purpose and it didn’t feel like much of a threat, but my experience with opioids did have enough of a tug that it scared me away from doing it a second time. After more time for the demon to work though, I don’t find the idea appealing anymore and I’m pretty confident that I wouldn’t be tempted even if I took some again. You just don’t want to get stuck between the update of “Ooh, this stuff feels really good” and the update of “It’s not though, lol. It’s a lie, and and chasing it leads to ruin. How tempting is it to ruin your life chasing a lie?”. It’s a “valley of bad rationality” problem, if you lack the foresight to avoid it.
Anyway, I feel like we’re getting off-track: I’m really much more interested in talking about AI alignment than about humans.
I don’t think you can actually get away from it. For one, you can’t design an AI to give you what you want if you don’t know what you want—and you don’t know what you want unless you’re aligned yourself. If you understand the process of human alignment, then you can conceivably create an AI which will help you along in the right direction. If you don’t have that, even if you manage to manage to hit what you’re aiming at you’re likely to be a somewhat more sophisticated version of a dope fiend aiming for more dope—and get the resulting outcomes. Because of Goodhart’s law, “using AI to get what I already know I want” falls apart once AI becomes sufficiently powerful.
For two, I don’t think anyone has anywhere near good enough idea about how alignment works in general that it makes sense to neglect the one example we have a lot of experience with and easy ability to experiment with. It’s one thing to not trap yourself in the ornithopter box, but wings are everywhere for a reason, and until you understand that and have a solid understanding of aerodynamics and have better flying machines than birds, it is premature neglect to study what’s going on with bird wings. Even with a pretty solid understanding of aerodynamics, studying birds gives some neat solutions to things like adverse yaw and ideal lift distributions. You seem to be getting at this at the end of your comment.
For three, if we’re talking about “brain like” AGI and training them in a ways analogous to getting a kid to be a moon fan, it’s important to understand what is actually happening when a kid becomes a fan of “the moon” and where that’s likely to go wrong. The AI we have now are remarkably human in their training process and failures so unless we take a massive departure from this, understanding how human alignment works is directly relevant.
But it’s necessary for getting good outcomes out of a superintelligence!
Makes sense. I think I have a somewhat better idea of how you see the demon thing now.
I disagree with bad demon here. I’ve used nicotine for that purpose and it didn’t feel like much of a threat, but my experience with opioids did have enough of a tug that it scared me away from doing it a second time. After more time for the demon to work though, I don’t find the idea appealing anymore and I’m pretty confident that I wouldn’t be tempted even if I took some again. You just don’t want to get stuck between the update of “Ooh, this stuff feels really good” and the update of “It’s not though, lol. It’s a lie, and and chasing it leads to ruin. How tempting is it to ruin your life chasing a lie?”. It’s a “valley of bad rationality” problem, if you lack the foresight to avoid it.
I don’t think you can actually get away from it. For one, you can’t design an AI to give you what you want if you don’t know what you want—and you don’t know what you want unless you’re aligned yourself. If you understand the process of human alignment, then you can conceivably create an AI which will help you along in the right direction. If you don’t have that, even if you manage to manage to hit what you’re aiming at you’re likely to be a somewhat more sophisticated version of a dope fiend aiming for more dope—and get the resulting outcomes. Because of Goodhart’s law, “using AI to get what I already know I want” falls apart once AI becomes sufficiently powerful.
For two, I don’t think anyone has anywhere near good enough idea about how alignment works in general that it makes sense to neglect the one example we have a lot of experience with and easy ability to experiment with. It’s one thing to not trap yourself in the ornithopter box, but wings are everywhere for a reason, and until you understand that and have a solid understanding of aerodynamics and have better flying machines than birds, it is premature neglect to study what’s going on with bird wings. Even with a pretty solid understanding of aerodynamics, studying birds gives some neat solutions to things like adverse yaw and ideal lift distributions. You seem to be getting at this at the end of your comment.
For three, if we’re talking about “brain like” AGI and training them in a ways analogous to getting a kid to be a moon fan, it’s important to understand what is actually happening when a kid becomes a fan of “the moon” and where that’s likely to go wrong. The AI we have now are remarkably human in their training process and failures so unless we take a massive departure from this, understanding how human alignment works is directly relevant.