steven0461 comments on How can we ensure that a Friendly AI team will be sane enough?

steven0461 16 May 2012 23:01 UTC
21 points
This post highlights for me that we don’t have a good understanding of what things like “more rational” and “more sane” mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do. I think more understanding here would be highly valuable, and I mostly don’t think we can get it from studies of the general population. (We can locally define “more sane” as referring to whatever properties are needed to get the right answer on this specific question, of course, but then it might not correspond to definitions of “more sane” that we’re using in other contexts.)

Not that this answers your question, but there’s a potential tension between the goal of picking people with a deep understanding of FAI issues, and the goal of picking people who are unlikely to do things like become attached to the idea of being an FAI researcher.

Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?
- Wei Dai 18 May 2012 9:00 UTC
  2 points
  Parent
  
  This post highlights for me that we don’t have a good understanding of what things like “more rational” and “more sane” mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do.
  
  I mentioned some specific biases that seem especially likely to cause risk for an FAI team. Is that the kind of “understanding” you’re talking about, or something else?
  
  Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?
  
  I think probably there would just be an FAI research team that is told to continually reevaluate feasibility/safety as it goes. I just called it “FAI feasibility team” to emphasize that at the start its most important aim would be to evaluate feasibility and safety. Having an actual separate feasibility team might buy some additional overall sanity (but how, besides that attachment to being FAI researchers won’t be an issue since they won’t continue to be FAI researchers either way?). It seems like there would probably be better ways to spend the extra resources if we had them though.
  - steven0461 18 May 2012 20:48 UTC
    1 point
    Parent
    
    I mentioned some specific biases that seem especially likely to cause risk for an FAI team. Is that the kind of “understanding” you’re talking about, or something else?
    
    I think that falls under my parenthetical comment in the first paragraph. Understanding what rationality-type skills would make this specific thing go well is obviously useful, but it would also be great if we had a general understanding of what rationality-type skills naturally vary together, so that we can use phrases like “more rational” and have a better idea of what they refer to across different contexts.
    
    It seems like there would probably be better ways to spend the extra resources if we had them though.
    
    Maybe? Note that if people like Holden have concerns about whether FAI is too dangerous, that might make them more likely to provide resources toward a separate FAI feasibility team than toward, say, a better FAI team, so it’s not necessarily a fixed heap of resources that we’re distributing.
- Dr_Manhattan 17 May 2012 1:27 UTC
  2 points
  Parent
  
  Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?
  
  You’re implying possibility of official separation between the two teams, which seems like a good idea. Between “finding/filtering more rational people” and “social mechanisms” I would vote for mechanisms.
- John_Maxwell 17 May 2012 0:52 UTC
  2 points
  Parent
  If humans are irrational because of lots of bugs in our brain, it could be hard to measure how many bugs have been fixed or worked around, and how reliable these fixes or workarounds are.
  - steven0461 17 May 2012 1:03 UTC
    11 points
    Parent
    There’s how strong one’s rationality is at its peak, and how strong one’s rationality is on average, and the rarity and badness of lapses in one’s rationality, and how many contexts one’s rationality works in, and to what degree one’s mistakes and insights are independent of others’ mistakes and insights (independence of mistakes means you can correct each other, independence of insights means you don’t duplicate insights). All these measures could vary orthogonally.
    - ghf 17 May 2012 9:03 UTC
      2 points
      Parent
      Good point. And, depending on your assessment of the risks involved, especially for AGI research, the level of the lapses might be more important than the peak or even the average. A researcher who is perfectly rational (hand waving for the moment about how we measure that) 99% of the time but has, say, fits of rage every so often might be even more dangerous than the slightly less rational on average colleague who is nonetheless stable.
      - khafra 17 May 2012 12:20 UTC
        3 points
        Parent
        Or, proper mechanism design for the research team might be able to smooth out those troughs and let you use the highest-EV researchers without danger.
    - John_Maxwell 17 May 2012 2:03 UTC
      0 points
      Parent
      You seem to be using rationality to refer to both bug fixes and general intelligence. I’m more concerned about bug fixes myself, for the situation Wei Dai describes. Status-related bugs seem potentially the worst.
      - steven0461 17 May 2012 2:15 UTC
        2 points
        Parent
        I meant to refer to just bug fixes, I think. My comment wasn’t really responsive to yours, just prompted by it, and I should probably have added a note to that effect. One can imagine a set of bugs that become more fixed or less fixed over time, varying together in a continuous manner, depending on e.g. what emotional state one is in. One might be more vulnerable to many bugs when sleepy, for example. One can then talk about averages and extreme values of such a “general rationality” factor in a typical decision context, and talk about whether there are important non-standard contexts where new bugs become important that one hasn’t prepared for. I agree that bugs related to status (and to interpersonal conflict) seem particularly dangerous.