“I love violence and would hope that Mars is an utter bloodbath.”
The problem is that biological violence hurts like hell. Even most athletes live with chronic pain, imagine most warriors. Naturally we could solve the pain part, but then it wouldn’t be the violence I’m referring to. It would be videogame violence, which I’m ok with since it doesn’t cause pain or injury or death. But don’t worry, I still got the joke!
“”don’t kill anyone and don’t cause harm/suffering to anyone”
The problem with this one is that the AI’s optimal move is to cease to exist.”
I’ve thought about it as well. Big brain idea: perhaps the first AGI’s utility function would be to act in the real world as minimally as possible, maybe only with the goal of preventing other people from developing AGI, keeping like this until we solve alignment? Of course this latter part of policing the world would be already prone to a lot of ambiguity and sophism, but again, if we program do not’s (do not let anyone else build AGI, plus do not kill anyone, plus do not cause suffering, etc) instead of do’s, it could lead to a lot less ambiguity and sophism, by drastically curtailing the maneuver space. (Not that I’m saying that it would be easy.) As opposed to like “cure cancer”, or “build an Eiffel tower”.
“And that’s already relying on being able to say what ‘kill someone’ means in a sufficiently clear way that it will satisfy computer programmers”
I don’t think so. When the brain irreversibly stops you’re dead. It’s clear. This plays into my doubt that perhaps we keep underestimating the intelligence of a superintelligence. I think that even current AIs could be made to discern when a person is dead or alive, perhaps even better than us already.
“For instance, when Captain Kirk transports down to the Planet-of-Hats, did he just die when he was disassembled, and then get reborn? Do we need to know how the transporter works to say?”
Maybe don’t teletransport anyone until we’ve figured that out? There the problem is teletransportation itself, not AGI efficiently recognizing what is death at least as well as we do. (But I’d venture saying that it could even solve that philosophical problem, since it’s smarter than us.)
“Stuart Russell is a very clever man, and if his approach to finessing the alignment problem can be made to work then that’s the best news ever, go Stuart!
But I am a little sceptical because it does seem short on details, and the main worry is that before he can get anywhere, some fool is going to create an unaligned AI, and then we are all dead.”
I gotta admit that I completely agree.
“Whereas alignment looks harder and harder the more we learn about it.”
I’ll say that I’m not that much convinced about most of this that I’ve said. I’m still way more to the side that “control is super difficult and we’re all gonna die (or worse)”. But I keep thinking about these things, to see if maybe there’s a “way out”. Maybe we in this community have built a bias that “only the mega difficult value alignment will work”, when it could be false. Maybe it’s not just “clever hacks”, maybe there are simply more efficient and tractable ways to control advanced AI than the intractable value alignment. But again, I’m not even that much convinced myself.
“and I am pretty sure that the author of ‘Failed Utopia 4-2’ has at least considered the possibility that it might not be so bad if we only get it 99%-right.”
Exactly. Again, perhaps there are much more tractable ways than 100% alignment. OR, if we could at least solve worst-case AI safety (that is, prevent s-risk) it would already be a massive win.
do not let anyone else build AGI, plus do not kill anyone, plus do not cause suffering, etc
your problem here is that a good move for the AI is now to anaesthetize everyone, but keep them alive although unconscious until they die naturally.
act in the real world as minimally as possible
I think this might have been one of MIRI’s ideas, but it turns out to be tricky to define what it means. I can’t think what they called it so I can’t find it, but someone will know.
Maybe don’t teletransport anyone until we’ve figured that out?
There may not actually be an answer! I had thought planning for cryonic preservation was a good idea since I was a little boy.
But I found that Eliezer’s arguments in favour of cryonics actually worked backwards on me, and caused me to abandon my previous ideas about what death is and whether I care about entities in the future that remember being me or how many of them there are.
Luckily all that’s replaced them is a vast confusion so I do still have a smoke alarm. Otherwise I ignore the whole problem, go on as usual and don’t bother with cryonics because I’m not anticipating making it to the point of natural death anyway.
OR, if we could at least solve worst-case AI safety (that is, prevent s-risk) it would already be a massive win.
Easy! Build a paperclipper, it kills everyone. We don’t even need to bother doing this, plenty of well funded clever people are working very hard on it on our behalf.
When the brain irreversibly stops you’re dead. It’s clear.
Your problem here is ‘irreversible’, and ‘stops’. How about just slowing it down a really lot?
The problem is that biological violence hurts like hell.
No problem there, I loved rugby and cricket, and they hurt a lot. I’m no masochist! Overcoming the fear and pain and playing anyway is part of the point. What I don’t like is irreversible damage. I have various lifelong injuries (mostly from rugby and cricket...), and various effects of aging preventing me from playing, but if they could be fixed I’d be straight back out there.
But cricket and rugby are no substitute for war, which is what they’re trying to be. And on Mars all injuries heal roughly at the point the pubs open.
have built a bias that “only the mega difficult value alignment will work”
I don’t think so. I think we’d settle for “anything that does better than everyone’s dead”. The problem is that most of the problems look fundamental. If you can do even slightly better than “everyone’s dead”, you can probably solve the whole thing (and build a friendly paperclipper that fills the universe with awesomeness).
So if you do end up coming up with something even slightly better than “everyone’s dead”, do let us know.
I think a lot of the obvious ideas have been thought of before, but I think even then there might still be mileage in making top-level posts about ideas here and letting people take pot-shots at them.
There may well be a nice clear obvious solution to the alignment problem which will make everyone feel a bit silly in retrospect.
It would be ever so undignified if we didn’t think of it because we were convinced we’d already tried everything.
“I love violence and would hope that Mars is an utter bloodbath.”
The problem is that biological violence hurts like hell. Even most athletes live with chronic pain, imagine most warriors. Naturally we could solve the pain part, but then it wouldn’t be the violence I’m referring to. It would be videogame violence, which I’m ok with since it doesn’t cause pain or injury or death. But don’t worry, I still got the joke!
“”don’t kill anyone and don’t cause harm/suffering to anyone”
The problem with this one is that the AI’s optimal move is to cease to exist.”
I’ve thought about it as well. Big brain idea: perhaps the first AGI’s utility function would be to act in the real world as minimally as possible, maybe only with the goal of preventing other people from developing AGI, keeping like this until we solve alignment? Of course this latter part of policing the world would be already prone to a lot of ambiguity and sophism, but again, if we program do not’s (do not let anyone else build AGI, plus do not kill anyone, plus do not cause suffering, etc) instead of do’s, it could lead to a lot less ambiguity and sophism, by drastically curtailing the maneuver space. (Not that I’m saying that it would be easy.) As opposed to like “cure cancer”, or “build an Eiffel tower”.
“And that’s already relying on being able to say what ‘kill someone’ means in a sufficiently clear way that it will satisfy computer programmers”
I don’t think so. When the brain irreversibly stops you’re dead. It’s clear. This plays into my doubt that perhaps we keep underestimating the intelligence of a superintelligence. I think that even current AIs could be made to discern when a person is dead or alive, perhaps even better than us already.
“For instance, when Captain Kirk transports down to the Planet-of-Hats, did he just die when he was disassembled, and then get reborn? Do we need to know how the transporter works to say?”
Maybe don’t teletransport anyone until we’ve figured that out? There the problem is teletransportation itself, not AGI efficiently recognizing what is death at least as well as we do. (But I’d venture saying that it could even solve that philosophical problem, since it’s smarter than us.)
“Stuart Russell is a very clever man, and if his approach to finessing the alignment problem can be made to work then that’s the best news ever, go Stuart!
But I am a little sceptical because it does seem short on details, and the main worry is that before he can get anywhere, some fool is going to create an unaligned AI, and then we are all dead.”
I gotta admit that I completely agree.
“Whereas alignment looks harder and harder the more we learn about it.”
I’ll say that I’m not that much convinced about most of this that I’ve said. I’m still way more to the side that “control is super difficult and we’re all gonna die (or worse)”. But I keep thinking about these things, to see if maybe there’s a “way out”. Maybe we in this community have built a bias that “only the mega difficult value alignment will work”, when it could be false. Maybe it’s not just “clever hacks”, maybe there are simply more efficient and tractable ways to control advanced AI than the intractable value alignment. But again, I’m not even that much convinced myself.
“and I am pretty sure that the author of ‘Failed Utopia 4-2’ has at least considered the possibility that it might not be so bad if we only get it 99%-right.”
Exactly. Again, perhaps there are much more tractable ways than 100% alignment. OR, if we could at least solve worst-case AI safety (that is, prevent s-risk) it would already be a massive win.
your problem here is that a good move for the AI is now to anaesthetize everyone, but keep them alive although unconscious until they die naturally.
I think this might have been one of MIRI’s ideas, but it turns out to be tricky to define what it means. I can’t think what they called it so I can’t find it, but someone will know.
There may not actually be an answer! I had thought planning for cryonic preservation was a good idea since I was a little boy.
But I found that Eliezer’s arguments in favour of cryonics actually worked backwards on me, and caused me to abandon my previous ideas about what death is and whether I care about entities in the future that remember being me or how many of them there are.
Luckily all that’s replaced them is a vast confusion so I do still have a smoke alarm. Otherwise I ignore the whole problem, go on as usual and don’t bother with cryonics because I’m not anticipating making it to the point of natural death anyway.
Easy! Build a paperclipper, it kills everyone. We don’t even need to bother doing this, plenty of well funded clever people are working very hard on it on our behalf.
Your problem here is ‘irreversible’, and ‘stops’. How about just slowing it down a really lot?
No problem there, I loved rugby and cricket, and they hurt a lot. I’m no masochist! Overcoming the fear and pain and playing anyway is part of the point. What I don’t like is irreversible damage. I have various lifelong injuries (mostly from rugby and cricket...), and various effects of aging preventing me from playing, but if they could be fixed I’d be straight back out there.
But cricket and rugby are no substitute for war, which is what they’re trying to be. And on Mars all injuries heal roughly at the point the pubs open.
I don’t think so. I think we’d settle for “anything that does better than everyone’s dead”. The problem is that most of the problems look fundamental. If you can do even slightly better than “everyone’s dead”, you can probably solve the whole thing (and build a friendly paperclipper that fills the universe with awesomeness).
So if you do end up coming up with something even slightly better than “everyone’s dead”, do let us know.
I think a lot of the obvious ideas have been thought of before, but I think even then there might still be mileage in making top-level posts about ideas here and letting people take pot-shots at them.
There may well be a nice clear obvious solution to the alignment problem which will make everyone feel a bit silly in retrospect.
It would be ever so undignified if we didn’t think of it because we were convinced we’d already tried everything.