It does seem like an interesting question. But the most obvious flaw is that we still don’t have the starting point—software does what we tell it to do, not what we want, which is usually different—and I don’t immediately see any way to get there without super-intelligence.
Holden Karnofsky proposed starting with an Oracle AI that tells us what it would do if we gave it different goal systems. But if we avoided giving it any utility function of its own, the programmers would need to not only think of every question (regarding every aspect of “what it would do”), but also create an interface for each sufficiently new answer. I’ll go out on a limb and say this will never happen (much less happen correctly) if someone in the world can just create an ‘Agent AI’.
It does seem like an interesting question. But the most obvious flaw is that we still don’t have the starting point—software does what we tell it to do, not what we want, which is usually different—and I don’t immediately see any way to get there without super-intelligence.
Holden Karnofsky proposed starting with an Oracle AI that tells us what it would do if we gave it different goal systems. But if we avoided giving it any utility function of its own, the programmers would need to not only think of every question (regarding every aspect of “what it would do”), but also create an interface for each sufficiently new answer. I’ll go out on a limb and say this will never happen (much less happen correctly) if someone in the world can just create an ‘Agent AI’.