JGWeissman comments on Building toward a Friendly AI team

JGWeissman 6 Jun 2012 21:25 UTC
2 points
0
Some members of an FAI team should have a background in human psychology, as this is highly relevant to figuring out the Friendly utility function. However, global resource management seems like the sort of problem that could left to the FAI to figure out.
- Vladimir_Nesov 6 Jun 2012 21:39 UTC
  3 points
  0
  Parent
  
  should have a background in human psychology, as this is highly relevant to figuring out the Friendly utility function
  
  My current opinion is that it’s completely irrelevant. The typical tools developed around the study of human psychology are vastly less accurate than necessary to do the job. Background in mathematics, physics or machine learning seems potentially much more relevant, specifically for the problem of figuring out human goals and not just for other AI-related problems.
  - jsteinhardt 7 Jun 2012 4:02 UTC
    0 points
    0
    Parent
    No matter how smart you are, looking at the data is essential. Cognitive scientists have spent a long time looking at the data of how humans think / behave, and can probably appreciate subtleties that would be missed by even the most clever mathematicians (unless those mathematicians looked at the same set of data).
    - Mitchell_Porter 7 Jun 2012 4:19 UTC
      2 points
      0
      Parent
      I believe Vladimir is thinking in terms of a general theory which could, say, take an arbitrary computational state-machine, interpret it as a decision-theoretic agent, and deduce the “state-machine it would want to be”, according to its “values”, where the phrases in quotes represent imprecise or even misleading designations for rigorous concepts yet to be identified. This would be a form of the long-sought “reflective decision theory” that gets talked about.
      
      From this perspective, the coherent extrapolation of human volition is a matter of reconstructing the human state machine through first-principles physical and computational analysis of the human brain, identifying what type of agent it is, and reflectively idealizing it according to its type and its traits. (An examples of type-and-traits analysis would be 1) identifying an agent as an expected-utility maximizer—that’s its “type” − 2) identifying its specific utility function—that’s a “trait”. But the cognitive architecture underlying human decision-making is expected to be a lot more complicated to specify.)
      
      So the paradigm really is one in which one hopes to skip over all the piecemeal ideas and empirical analysis that cognitive scientists have produced, by coming up with an analytical and extrapolative method of perfect rigor and great generality. In my opinion, people trying to develop this perfect a-priori method can still derive inspiration and knowledge from science that has already been done. But the idea is not “we can neglect existing science because our team will be smarter”, the idea is that a universal method—in the spirit of Solomonoff induction, but tractable—can be identified, which will then allow the problem to be solved with a minimum of prior knowledge.
      - jsteinhardt 7 Jun 2012 4:27 UTC
        0 points
        0
        Parent
        From an outside view, such a plan seems unlikely to succeed. Science moves forward by data, engineering moves forward by trying things out. This is just intuition though, I would guess there is a reasonable amount of empirical evidence to be gained by looking at theoretical work and seeing how often it runs awry of unexpected facts about the world (I’m embarrassingly unsure of what the answer would be here; added to my list of things to try to figure out).
  - JGWeissman 6 Jun 2012 21:49 UTC
    0 points
    0
    Parent
    I agree that the “typical tools developed around the study of human psychology are vastly less accurate than necessary to do the job”, but it still seems like figuring out what humans value is a problem of human psychology. I don’t see how theoretical physics has anything to do with it.
    - Vladimir_Nesov 6 Jun 2012 21:59 UTC
      4 points
      0
      Parent
      Whether it’s a “problem of human psychology” is a question of assigning an area-of-study label to the problem. The area-of-study characteristic doesn’t seem to particularly help with finding methods appropriate for solving the problem in this case. So I propose to focus on the other characteristics of the problem, namely the necessary rigor in an acceptable solution and the potential difficulty of the concepts necessary to formulate the solution (in the study of a real-world phenomenon). These characteristics match mathematics and physics best (probably more mathematics than physics).
      - JGWeissman 6 Jun 2012 22:11 UTC
        1 point
        0
        Parent
        I would expect all FAI team members to have strong math skills in addition to whatever other background they may have, and I expect them to approach the psychological aspects of the problem with greater rigor than is typical of mainstream psychology, and that their math backgrounds will contribute to this. But I think that mainstream psychology would be of some use to them, even if just to provide some concepts to be explored more rigorously.
      - prashantsohani 9 Jun 2012 17:50 UTC
        0 points
        0
        Parent
        
        the potential difficulty of the concepts necessary to formulate the solution
        
        As I see it, there might be considerable difficulty of concepts in formulating even the exact problem statement. For instance, given that we want a ‘friendly’ AI; our problem statement very much depends on our notion of friendliness; hence the necessity of including psychology.
        
        Going further, considering that SI aims to minimize AI risk, we need to be clear on which AI behavior is said to constitute a ‘risk’. If I remember correctly, the AI in the movie “I-robot” inevitably concludes that killing the human race is the only way to save the planet. The definition of risk in such a scenario is a very delicate problem.