steven0461 comments on How can we ensure that a Friendly AI team will be sane enough?

steven0461 17 May 2012 1:03 UTC
11 points
There’s how strong one’s rationality is at its peak, and how strong one’s rationality is on average, and the rarity and badness of lapses in one’s rationality, and how many contexts one’s rationality works in, and to what degree one’s mistakes and insights are independent of others’ mistakes and insights (independence of mistakes means you can correct each other, independence of insights means you don’t duplicate insights). All these measures could vary orthogonally.
- ghf 17 May 2012 9:03 UTC
  2 points
  Parent
  Good point. And, depending on your assessment of the risks involved, especially for AGI research, the level of the lapses might be more important than the peak or even the average. A researcher who is perfectly rational (hand waving for the moment about how we measure that) 99% of the time but has, say, fits of rage every so often might be even more dangerous than the slightly less rational on average colleague who is nonetheless stable.
  - khafra 17 May 2012 12:20 UTC
    3 points
    Parent
    Or, proper mechanism design for the research team might be able to smooth out those troughs and let you use the highest-EV researchers without danger.
- John_Maxwell 17 May 2012 2:03 UTC
  0 points
  Parent
  You seem to be using rationality to refer to both bug fixes and general intelligence. I’m more concerned about bug fixes myself, for the situation Wei Dai describes. Status-related bugs seem potentially the worst.
  - steven0461 17 May 2012 2:15 UTC
    2 points
    Parent
    I meant to refer to just bug fixes, I think. My comment wasn’t really responsive to yours, just prompted by it, and I should probably have added a note to that effect. One can imagine a set of bugs that become more fixed or less fixed over time, varying together in a continuous manner, depending on e.g. what emotional state one is in. One might be more vulnerable to many bugs when sleepy, for example. One can then talk about averages and extreme values of such a “general rationality” factor in a typical decision context, and talk about whether there are important non-standard contexts where new bugs become important that one hasn’t prepared for. I agree that bugs related to status (and to interpersonal conflict) seem particularly dangerous.