You’re missing the steps whereby the AI gets to a position of power. AI presumably goes from a position of no power and moderate intelligence (where it is now) to a position of great power and superhuman intelligence, whereupon it can do what it wants. But we’re not going to deliberately allow such a position unless we can trust it. We can’t let it get superintelligent or powerful-in-the-world unless we can prove it will use its powers wisely. Part of that is being non-deceptive. Indeed, if we can build it provably non-deceptive, we can simply ask it what it intends to do before we release it. So deception is a lot of what we worry about.
Of course we work on other problems too, like making it helpful and obedient.
a position of no power and moderate intelligence (where it is now)
Most people are quite happy to give current AIs relatively unrestricted access to sensitive data, APIs, and other powerful levers for effecting far-reaching change in the world. So far, this has actually worked out totally fine! But that’s mostly because the AIs aren’t (yet) smart enough to make effective use of those levers (for good or ill), let alone be deceptive about it.
To the degree that people don’t trust AIs with access to even more powerful levers, it’s usually because they fear the AI getting tricked by adversarial humans into misusing those levers (e.g. through prompt injection), not fear that the AI itself will be deliberately tricky.
But we’re not going to deliberately allow such a position unless we can trust it.
One can hope, sure. But what I actually expect is that people will generally give AIs more power and trust as they get more capable, not less.
You’re missing the steps whereby the AI gets to a position of power. AI presumably goes from a position of no power and moderate intelligence (where it is now) to a position of great power and superhuman intelligence, whereupon it can do what it wants. But we’re not going to deliberately allow such a position unless we can trust it. We can’t let it get superintelligent or powerful-in-the-world unless we can prove it will use its powers wisely. Part of that is being non-deceptive. Indeed, if we can build it provably non-deceptive, we can simply ask it what it intends to do before we release it. So deception is a lot of what we worry about.
Of course we work on other problems too, like making it helpful and obedient.
Most people are quite happy to give current AIs relatively unrestricted access to sensitive data, APIs, and other powerful levers for effecting far-reaching change in the world. So far, this has actually worked out totally fine! But that’s mostly because the AIs aren’t (yet) smart enough to make effective use of those levers (for good or ill), let alone be deceptive about it.
To the degree that people don’t trust AIs with access to even more powerful levers, it’s usually because they fear the AI getting tricked by adversarial humans into misusing those levers (e.g. through prompt injection), not fear that the AI itself will be deliberately tricky.
One can hope, sure. But what I actually expect is that people will generally give AIs more power and trust as they get more capable, not less.