(It’s possible that intentionality isn’t the sharpest distinction between “tools” and “agents”, but it’s the one that I see most often emphasized in philosophy of mind, especially with regards to necessary preconditions for the development of any “strong AI”.)
It seems that one could write an AI that is in some sense “provably Friendly” even while remaining agnostic as to whether the described AI is or will ultimately become a tool or an agent. It might be that a proposed AI couldn’t be an agent because it couldn’t solve the symbol grounding problem, i.e. because it lacked intentionality, and thus wouldn’t be an effective FAI, but would nonetheless be Friendly in a certain limited sense. However if effectiveness is considered a requirement of Friendliness then one would indeed have to prove in advance that one’s proposed AI could solve the grounding problem in order to prove that said AI was Friendly, or alternatively, prove that the grounding problem as such isn’t a meaningful concept. I’m not sure what Eliezer would say about this; given his thinking about “outcome pumps” and so on, I doubt he thinks symbol grounding is a fundamental or meaningful problem, and so I doubt that he has or is planning to develop any formal argument that symbol grounding isn’t a fundamental roadblock for his preferred attack on AGI.
I guess I am jumping the shark here. The shark in question being the framework itself. What would a relevant mathematical framework entail?
Your question about what a relevant mathematical framework would entail seems too vague for me to parse; my apologies, it’s likely my exhaustion. But anyway, if minds leave certain characteristic marks on their environment by virtue of their having intentional (mental) states, then how precise and deep you can make your distinguishing mathematical framework depends on how sharp a cutoff there is in reality between intentional and non-intentional states. It’s possible that the cutoff isn’t sharp at all, in which case it’s questionable whether the supposed distinction exists or is meaningful. If that’s the case then it’s quite possible that it’s not possible to formulate a deep theory that could distinguish agents from tools, or intentional states from non-intentional ones. I think it likely that most AGI researchers, including Eliezer, hold the position that it is indeed impossible to do so. I don’t think it would be possible to prove the non-existence of a sharp cutoff, so I think Eliezer could justifiably conclude that he didn’t have to prove that his AI would be an “agent” or a “tool”, because he could deny, even without mathematical justification, that such a distinction is meaningful.
(It’s possible that intentionality isn’t the sharpest distinction between “tools” and “agents”, but it’s the one that I see most often emphasized in philosophy of mind, especially with regards to necessary preconditions for the development of any “strong AI”.)
It seems that one could write an AI that is in some sense “provably Friendly” even while remaining agnostic as to whether the described AI is or will ultimately become a tool or an agent. It might be that a proposed AI couldn’t be an agent because it couldn’t solve the symbol grounding problem, i.e. because it lacked intentionality, and thus wouldn’t be an effective FAI, but would nonetheless be Friendly in a certain limited sense. However if effectiveness is considered a requirement of Friendliness then one would indeed have to prove in advance that one’s proposed AI could solve the grounding problem in order to prove that said AI was Friendly, or alternatively, prove that the grounding problem as such isn’t a meaningful concept. I’m not sure what Eliezer would say about this; given his thinking about “outcome pumps” and so on, I doubt he thinks symbol grounding is a fundamental or meaningful problem, and so I doubt that he has or is planning to develop any formal argument that symbol grounding isn’t a fundamental roadblock for his preferred attack on AGI.
Your question about what a relevant mathematical framework would entail seems too vague for me to parse; my apologies, it’s likely my exhaustion. But anyway, if minds leave certain characteristic marks on their environment by virtue of their having intentional (mental) states, then how precise and deep you can make your distinguishing mathematical framework depends on how sharp a cutoff there is in reality between intentional and non-intentional states. It’s possible that the cutoff isn’t sharp at all, in which case it’s questionable whether the supposed distinction exists or is meaningful. If that’s the case then it’s quite possible that it’s not possible to formulate a deep theory that could distinguish agents from tools, or intentional states from non-intentional ones. I think it likely that most AGI researchers, including Eliezer, hold the position that it is indeed impossible to do so. I don’t think it would be possible to prove the non-existence of a sharp cutoff, so I think Eliezer could justifiably conclude that he didn’t have to prove that his AI would be an “agent” or a “tool”, because he could deny, even without mathematical justification, that such a distinction is meaningful.
I’m tired, apologies for any errors.