If we have short term myopic misaligned AI is still misaligned. It looks like social media algorithms promoting clickbait, like self driving cars turning themselves off half a second before an inevitable crash. Like chatbot recommendation systems telling you what you want to hear, never mind if it’s true.
This is a world where AI is widely used, and is full of non-world-destroying bugs. Self driving cars have been patched and twiddled until they usually work. But on Tuesdays when the moon is waning, they will tend to move to change lanes the rightmost lane, and no one knows why.
This is a suboptimal world that is also one where humans could survive. You’re describing dangers that are usually not worse than what we already live (and die) with. (the SDCs can easily be measurably a lot better than human drivers and human pilots, even if they do have odd tendencies)
Note the “SDC turning itself off half a second before a crash” is actually Working as Intended. If a crash is detected as inevitable—because the high level policy failed to work—you want a microcontroller running plain C code to order maximum braking force. Current systems more or less do work this way.
If we have short term myopic misaligned AI is still misaligned. It looks like social media algorithms promoting clickbait, like self driving cars turning themselves off half a second before an inevitable crash. Like chatbot recommendation systems telling you what you want to hear, never mind if it’s true.
This is a world where AI is widely used, and is full of non-world-destroying bugs. Self driving cars have been patched and twiddled until they usually work. But on Tuesdays when the moon is waning, they will tend to move to change lanes the rightmost lane, and no one knows why.
This is a suboptimal world that is also one where humans could survive. You’re describing dangers that are usually not worse than what we already live (and die) with. (the SDCs can easily be measurably a lot better than human drivers and human pilots, even if they do have odd tendencies)
Note the “SDC turning itself off half a second before a crash” is actually Working as Intended. If a crash is detected as inevitable—because the high level policy failed to work—you want a microcontroller running plain C code to order maximum braking force. Current systems more or less do work this way.
I wasn’t claiming this was a particularly bad world. I was just disagreeing with the idea that myopic AI=Aligned AI.
The turning itself off thing, I was thinking of the tetris bot that learned to pause the game.