should have a background in human psychology, as this is highly relevant to figuring out the Friendly utility function
My current opinion is that it’s completely irrelevant. The typical tools developed around the study of human psychology are vastly less accurate than necessary to do the job. Background in mathematics, physics or machine learning seems potentially much more relevant, specifically for the problem of figuring out human goals and not just for other AI-related problems.
No matter how smart you are, looking at the data is essential. Cognitive scientists have spent a long time looking at the data of how humans think / behave, and can probably appreciate subtleties that would be missed by even the most clever mathematicians (unless those mathematicians looked at the same set of data).
I believe Vladimir is thinking in terms of a general theory which could, say, take an arbitrary computational state-machine, interpret it as a decision-theoretic agent, and deduce the “state-machine it would want to be”, according to its “values”, where the phrases in quotes represent imprecise or even misleading designations for rigorous concepts yet to be identified. This would be a form of the long-sought “reflective decision theory” that gets talked about.
From this perspective, the coherent extrapolation of human volition is a matter of reconstructing the human state machine through first-principles physical and computational analysis of the human brain, identifying what type of agent it is, and reflectively idealizing it according to its type and its traits. (An examples of type-and-traits analysis would be 1) identifying an agent as an expected-utility maximizer—that’s its “type” − 2) identifying its specific utility function—that’s a “trait”. But the cognitive architecture underlying human decision-making is expected to be a lot more complicated to specify.)
So the paradigm really is one in which one hopes to skip over all the piecemeal ideas and empirical analysis that cognitive scientists have produced, by coming up with an analytical and extrapolative method of perfect rigor and great generality. In my opinion, people trying to develop this perfect a-priori method can still derive inspiration and knowledge from science that has already been done. But the idea is not “we can neglect existing science because our team will be smarter”, the idea is that a universal method—in the spirit of Solomonoff induction, but tractable—can be identified, which will then allow the problem to be solved with a minimum of prior knowledge.
From an outside view, such a plan seems unlikely to succeed. Science moves forward by data, engineering moves forward by trying things out. This is just intuition though, I would guess there is a reasonable amount of empirical evidence to be gained by looking at theoretical work and seeing how often it runs awry of unexpected facts about the world (I’m embarrassingly unsure of what the answer would be here; added to my list of things to try to figure out).
I agree that the “typical tools developed around the study of human psychology are vastly less accurate than necessary to do the job”, but it still seems like figuring out what humans value is a problem of human psychology. I don’t see how theoretical physics has anything to do with it.
Whether it’s a “problem of human psychology” is a question of assigning an area-of-study label to the problem. The area-of-study characteristic doesn’t seem to particularly help with finding methods appropriate for solving the problem in this case. So I propose to focus on the other characteristics of the problem, namely the necessary rigor in an acceptable solution and the potential difficulty of the concepts necessary to formulate the solution (in the study of a real-world phenomenon). These characteristics match mathematics and physics best (probably more mathematics than physics).
I would expect all FAI team members to have strong math skills in addition to whatever other background they may have, and I expect them to approach the psychological aspects of the problem with greater rigor than is typical of mainstream psychology, and that their math backgrounds will contribute to this. But I think that mainstream psychology would be of some use to them, even if just to provide some concepts to be explored more rigorously.
the potential difficulty of the concepts necessary to formulate the solution
As I see it, there might be considerable difficulty of concepts in formulating even the exact problem statement. For instance, given that we want a ‘friendly’ AI; our problem statement very much depends on our notion of friendliness; hence the necessity of including psychology.
Going further, considering that SI aims to minimize AI risk, we need to be clear on which AI behavior is said to constitute a ‘risk’. If I remember correctly, the AI in the movie “I-robot” inevitably concludes that killing the human race is the only way to save the planet. The definition of risk in such a scenario is a very delicate problem.
My current opinion is that it’s completely irrelevant. The typical tools developed around the study of human psychology are vastly less accurate than necessary to do the job. Background in mathematics, physics or machine learning seems potentially much more relevant, specifically for the problem of figuring out human goals and not just for other AI-related problems.
No matter how smart you are, looking at the data is essential. Cognitive scientists have spent a long time looking at the data of how humans think / behave, and can probably appreciate subtleties that would be missed by even the most clever mathematicians (unless those mathematicians looked at the same set of data).
I believe Vladimir is thinking in terms of a general theory which could, say, take an arbitrary computational state-machine, interpret it as a decision-theoretic agent, and deduce the “state-machine it would want to be”, according to its “values”, where the phrases in quotes represent imprecise or even misleading designations for rigorous concepts yet to be identified. This would be a form of the long-sought “reflective decision theory” that gets talked about.
From this perspective, the coherent extrapolation of human volition is a matter of reconstructing the human state machine through first-principles physical and computational analysis of the human brain, identifying what type of agent it is, and reflectively idealizing it according to its type and its traits. (An examples of type-and-traits analysis would be 1) identifying an agent as an expected-utility maximizer—that’s its “type” − 2) identifying its specific utility function—that’s a “trait”. But the cognitive architecture underlying human decision-making is expected to be a lot more complicated to specify.)
So the paradigm really is one in which one hopes to skip over all the piecemeal ideas and empirical analysis that cognitive scientists have produced, by coming up with an analytical and extrapolative method of perfect rigor and great generality. In my opinion, people trying to develop this perfect a-priori method can still derive inspiration and knowledge from science that has already been done. But the idea is not “we can neglect existing science because our team will be smarter”, the idea is that a universal method—in the spirit of Solomonoff induction, but tractable—can be identified, which will then allow the problem to be solved with a minimum of prior knowledge.
From an outside view, such a plan seems unlikely to succeed. Science moves forward by data, engineering moves forward by trying things out. This is just intuition though, I would guess there is a reasonable amount of empirical evidence to be gained by looking at theoretical work and seeing how often it runs awry of unexpected facts about the world (I’m embarrassingly unsure of what the answer would be here; added to my list of things to try to figure out).
I agree that the “typical tools developed around the study of human psychology are vastly less accurate than necessary to do the job”, but it still seems like figuring out what humans value is a problem of human psychology. I don’t see how theoretical physics has anything to do with it.
Whether it’s a “problem of human psychology” is a question of assigning an area-of-study label to the problem. The area-of-study characteristic doesn’t seem to particularly help with finding methods appropriate for solving the problem in this case. So I propose to focus on the other characteristics of the problem, namely the necessary rigor in an acceptable solution and the potential difficulty of the concepts necessary to formulate the solution (in the study of a real-world phenomenon). These characteristics match mathematics and physics best (probably more mathematics than physics).
I would expect all FAI team members to have strong math skills in addition to whatever other background they may have, and I expect them to approach the psychological aspects of the problem with greater rigor than is typical of mainstream psychology, and that their math backgrounds will contribute to this. But I think that mainstream psychology would be of some use to them, even if just to provide some concepts to be explored more rigorously.
As I see it, there might be considerable difficulty of concepts in formulating even the exact problem statement. For instance, given that we want a ‘friendly’ AI; our problem statement very much depends on our notion of friendliness; hence the necessity of including psychology.
Going further, considering that SI aims to minimize AI risk, we need to be clear on which AI behavior is said to constitute a ‘risk’. If I remember correctly, the AI in the movie “I-robot” inevitably concludes that killing the human race is the only way to save the planet. The definition of risk in such a scenario is a very delicate problem.