No. There’s also no proof that it’s possible to prove that P!=NP, and for the Friendliness problem it’s much, much less clear what the problem even means. You aren’t entitled to that particular proof, it’s not expected to be available until it’s not needed anymore. (Many difficult problems get solved or almost solved without a proof of them being solvable appearing in the interim.)
There is no clearly defined or motivated problem of “proving Friendliness”. We need to understand what goals are, what humane goals are, what process can be used to access their formal definition, and what kinds of things can be done with them how to what end. We need to understand these things well, which (on psychological level) triggers association with mathematical proofs, and will probably actually involve some mathematics suitable to the task. Whether the answers take the form of something describable as “provable Friendliness” seems to me an unclear/unmotivated consideration. Unpacking that label might make it possible to provide a more useful response to the question.
Is there a proof that it’s possible to prove Friendliness?
I wonder what SI would do next if they could prove that friendly AI was not possible. For example if it could be shown that value drift was inevitable and that utility-functions are unstable under recursive self-improvement.
Something along the lines that value drift is inevitable and utility-functions are unstable under recursive self-improvement.
That doesn’t seem like the only circumstances in which FAI is not possible. If moral nihilism is true, then FAI is impossible even if value drift is not inevitable. In that circumstance, shouldn’t we try to make any AI we decide to build “friendly” to present day humanity, even if it wouldn’t be friendly to Aristotle or Plato or Confucius. Based on hidden complexity of wishes analysis, consistency with our current norms is still plenty hard.
My concerns are more that it will not be possible to adequately define “human”, especially as, transhuman tech develops, and that there might not be a good enough way to define what’s good for people.
As I understand it, the modest goal of building an FAI is that of giving an AGI a push in the “right” direction, what EY refers to as the initial dynamics. After that, all bets are off.
Is there a proof that it’s possible to prove Friendliness?
No. There’s also no proof that it’s possible to prove that P!=NP, and for the Friendliness problem it’s much, much less clear what the problem even means. You aren’t entitled to that particular proof, it’s not expected to be available until it’s not needed anymore. (Many difficult problems get solved or almost solved without a proof of them being solvable appearing in the interim.)
Why is it plausible that Friendliness is provable? Or is it more a matter that the problem is so important that it’s worth trying regardless?
There is no clearly defined or motivated problem of “proving Friendliness”. We need to understand what goals are, what humane goals are, what process can be used to access their formal definition, and what kinds of things can be done with them how to what end. We need to understand these things well, which (on psychological level) triggers association with mathematical proofs, and will probably actually involve some mathematics suitable to the task. Whether the answers take the form of something describable as “provable Friendliness” seems to me an unclear/unmotivated consideration. Unpacking that label might make it possible to provide a more useful response to the question.
I wonder what SI would do next if they could prove that friendly AI was not possible. For example if it could be shown that value drift was inevitable and that utility-functions are unstable under recursive self-improvement.
That doesn’t seem like the only circumstances in which FAI is not possible. If moral nihilism is true, then FAI is impossible even if value drift is not inevitable.
In that circumstance, shouldn’t we try to make any AI we decide to build “friendly” to present day humanity, even if it wouldn’t be friendly to Aristotle or Plato or Confucius. Based on hidden complexity of wishes analysis, consistency with our current norms is still plenty hard.
My concerns are more that it will not be possible to adequately define “human”, especially as, transhuman tech develops, and that there might not be a good enough way to define what’s good for people.
As I understand it, the modest goal of building an FAI is that of giving an AGI a push in the “right” direction, what EY refers to as the initial dynamics. After that, all bets are off.