Focusing on intentionality seems interesting since it lets us look at black box actors (whose agent-ness or tool-ness we don’t have to carefully define) and ask if they are acting in an apparently goal-directed manner. I’ve just skimmed [1] and barely remember [2] but it looks like you can make the inference work in simple cases and also prove some intractability results.
Obviously, FAI can’t be solved by just building some AI, modeling P(AI has goal “destroy humanity” | AI’s actions, state of world) and pulling the plug when that number gets too high. But maybe something else of value can be gained from a mathematical formalization like this.
Tenenbaum’s papers and related inductive approaches to detecting agency were the first attacks that came to mind, but I’m not sure that such statistical evidence could even in principle supply the sort of proof-strength support and precision that shminux seems to be looking for. I suppose I say this because I doubt someone like Searle would be convinced that an AI had intentional states in the relevant sense on the basis that it displayed sufficiently computationally complex communication, because such intentionality could easily be considered derived intentionality and thus not proof of the AI’s own agency. The point at which this objection loses its force unfortunately seems to be exactly the point at which you could actually run the AGI and watch it self-improve and so on, and so I’m not sure that it’s possible to prove hypothetical-Searle wrong in advance of actually running a full-blown AGI. Or is my model wrong?
Focusing on intentionality seems interesting since it lets us look at black box actors (whose agent-ness or tool-ness we don’t have to carefully define) and ask if they are acting in an apparently goal-directed manner. I’ve just skimmed [1] and barely remember [2] but it looks like you can make the inference work in simple cases and also prove some intractability results.
Obviously, FAI can’t be solved by just building some AI, modeling P(AI has goal “destroy humanity” | AI’s actions, state of world) and pulling the plug when that number gets too high. But maybe something else of value can be gained from a mathematical formalization like this.
[1] I. Van Rooij, J. Kwisthout, M. Blokpoel, J. Szymanik, T. Wareham, and I. Toni, “Intentional communication: Computationally easy or difficult?,” Frontiers in Human Neuroscience, vol. 5, 2011.
[2] C. L. Baker, R. R. Saxe, and J. B. Tenenbaum, “Bayesian theory of mind: Modeling joint belief-desire attribution,” Proceedings of the Thirty-Second Annual Conference of the Cognitive Science Society, 2011.
Tenenbaum’s papers and related inductive approaches to detecting agency were the first attacks that came to mind, but I’m not sure that such statistical evidence could even in principle supply the sort of proof-strength support and precision that shminux seems to be looking for. I suppose I say this because I doubt someone like Searle would be convinced that an AI had intentional states in the relevant sense on the basis that it displayed sufficiently computationally complex communication, because such intentionality could easily be considered derived intentionality and thus not proof of the AI’s own agency. The point at which this objection loses its force unfortunately seems to be exactly the point at which you could actually run the AGI and watch it self-improve and so on, and so I’m not sure that it’s possible to prove hypothetical-Searle wrong in advance of actually running a full-blown AGI. Or is my model wrong?