No one actually knows the exact task-difficulty threshold, but the intuition is that once a task is hard enough, any AI capable of completing the task is also capable of thinking of strategies that involve betraying its human creators. However, even if I don’t know the exact threshold, I can think of examples that should definitely be above the line. Starting a billion dollar company seems pretty difficult, but it could maybe be achieved by an special-purpose algorithm that just plays the stock market really well. But if we add a few more stipulations, like that the company has to make money by building an actual product, in an established industry with lots of competition, then probably that can only be done by a dangerous algorithm. It’s not a very big step from “figuring out how to outwit your competitors” to “realizing that you could outwit humans in general”.
An implicit assumption here is that I’m drawing the line between “safe” and “dangerous” at the point where the algorithm realizes that it could potentially achieve higher utility by betraying us. It’s possible that an algorithm could realize this, but still not be strong enough to “win” against humanity.
The easiest way is probably to build a modestly-sized company doing software and then find a way to destabilize the government and cause hyperinflation.
I think the rule of thumb should be: if your AI could be intentionally deployed to take over the world, it’s highly likely to do so unintentionally.
No one actually knows the exact task-difficulty threshold, but the intuition is that once a task is hard enough, any AI capable of completing the task is also capable of thinking of strategies that involve betraying its human creators. However, even if I don’t know the exact threshold, I can think of examples that should definitely be above the line. Starting a billion dollar company seems pretty difficult, but it could maybe be achieved by an special-purpose algorithm that just plays the stock market really well. But if we add a few more stipulations, like that the company has to make money by building an actual product, in an established industry with lots of competition, then probably that can only be done by a dangerous algorithm. It’s not a very big step from “figuring out how to outwit your competitors” to “realizing that you could outwit humans in general”.
An implicit assumption here is that I’m drawing the line between “safe” and “dangerous” at the point where the algorithm realizes that it could potentially achieve higher utility by betraying us. It’s possible that an algorithm could realize this, but still not be strong enough to “win” against humanity.
The easiest way is probably to build a modestly-sized company doing software and then find a way to destabilize the government and cause hyperinflation.
I think the rule of thumb should be: if your AI could be intentionally deployed to take over the world, it’s highly likely to do so unintentionally.