FAI is about being reliably harmless. Whether the outcome seems good in the short term is tangential. Even a “good” AI ought to be considered unfriendly if it’s opaque to proof—what can you possibly rely upon? No amount of demonstrated good behavior can be trusted. It could be insincere, it could be sincere but fatally misguided, it could have a flaw that will distort its goals after a few recursions. We would be stupid to just run it and see.
FAI is about being reliably harmless. Whether the outcome seems good in the short term is tangential. Even a “good” AI ought to be considered unfriendly if it’s opaque to proof—what can you possibly rely upon? No amount of demonstrated good behavior can be trusted. It could be insincere, it could be sincere but fatally misguided, it could have a flaw that will distort its goals after a few recursions. We would be stupid to just run it and see.
At which point you are starting to think of what it takes to make not just informally “Good” AI, but an actually Friendly AI.