This post highlights for me that we don’t have a good understanding of what things like “more rational” and “more sane” mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do. I think more understanding here would be highly valuable, and I mostly don’t think we can get it from studies of the general population. (We can locally define “more sane” as referring to whatever properties are needed to get the right answer on this specific question, of course, but then it might not correspond to definitions of “more sane” that we’re using in other contexts.)
Not that this answers your question, but there’s a potential tension between the goal of picking people with a deep understanding of FAI issues, and the goal of picking people who are unlikely to do things like become attached to the idea of being an FAI researcher.
Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?
This post highlights for me that we don’t have a good understanding of what things like “more rational” and “more sane” mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do.
I mentioned some specific biases that seem especially likely to cause risk for an FAI team. Is that the kind of “understanding” you’re talking about, or something else?
Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?
I think probably there would just be an FAI research team that is told to continually reevaluate feasibility/safety as it goes. I just called it “FAI feasibility team” to emphasize that at the start its most important aim would be to evaluate feasibility and safety. Having an actual separate feasibility team might buy some additional overall sanity (but how, besides that attachment to being FAI researchers won’t be an issue since they won’t continue to be FAI researchers either way?). It seems like there would probably be better ways to spend the extra resources if we had them though.
I mentioned some specific biases that seem especially likely to cause risk for an FAI team. Is that the kind of “understanding” you’re talking about, or something else?
I think that falls under my parenthetical comment in the first paragraph. Understanding what rationality-type skills would make this specific thing go well is obviously useful, but it would also be great if we had a general understanding of what rationality-type skills naturally vary together, so that we can use phrases like “more rational” and have a better idea of what they refer to across different contexts.
It seems like there would probably be better ways to spend the extra resources if we had them though.
Maybe? Note that if people like Holden have concerns about whether FAI is too dangerous, that might make them more likely to provide resources toward a separate FAI feasibility team than toward, say, a better FAI team, so it’s not necessarily a fixed heap of resources that we’re distributing.
Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?
You’re implying possibility of official separation between the two teams, which seems like a good idea. Between “finding/filtering more rational people” and “social mechanisms” I would vote for mechanisms.
If humans are irrational because of lots of bugs in our brain, it could be hard to measure how many bugs have been fixed or worked around, and how reliable these fixes or workarounds are.
There’s how strong one’s rationality is at its peak, and how strong one’s rationality is on average, and the rarity and badness of lapses in one’s rationality, and how many contexts one’s rationality works in, and to what degree one’s mistakes and insights are independent of others’ mistakes and insights (independence of mistakes means you can correct each other, independence of insights means you don’t duplicate insights). All these measures could vary orthogonally.
Good point. And, depending on your assessment of the risks involved, especially for AGI research, the level of the lapses might be more important than the peak or even the average. A researcher who is perfectly rational (hand waving for the moment about how we measure that) 99% of the time but has, say, fits of rage every so often might be even more dangerous than the slightly less rational on average colleague who is nonetheless stable.
You seem to be using rationality to refer to both bug fixes and general intelligence. I’m more concerned about bug fixes myself, for the situation Wei Dai describes. Status-related bugs seem potentially the worst.
I meant to refer to just bug fixes, I think. My comment wasn’t really responsive to yours, just prompted by it, and I should probably have added a note to that effect. One can imagine a set of bugs that become more fixed or less fixed over time, varying together in a continuous manner, depending on e.g. what emotional state one is in. One might be more vulnerable to many bugs when sleepy, for example. One can then talk about averages and extreme values of such a “general rationality” factor in a typical decision context, and talk about whether there are important non-standard contexts where new bugs become important that one hasn’t prepared for. I agree that bugs related to status (and to interpersonal conflict) seem particularly dangerous.
This post highlights for me that we don’t have a good understanding of what things like “more rational” and “more sane” mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do. I think more understanding here would be highly valuable, and I mostly don’t think we can get it from studies of the general population. (We can locally define “more sane” as referring to whatever properties are needed to get the right answer on this specific question, of course, but then it might not correspond to definitions of “more sane” that we’re using in other contexts.)
Not that this answers your question, but there’s a potential tension between the goal of picking people with a deep understanding of FAI issues, and the goal of picking people who are unlikely to do things like become attached to the idea of being an FAI researcher.
Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?
I mentioned some specific biases that seem especially likely to cause risk for an FAI team. Is that the kind of “understanding” you’re talking about, or something else?
I think probably there would just be an FAI research team that is told to continually reevaluate feasibility/safety as it goes. I just called it “FAI feasibility team” to emphasize that at the start its most important aim would be to evaluate feasibility and safety. Having an actual separate feasibility team might buy some additional overall sanity (but how, besides that attachment to being FAI researchers won’t be an issue since they won’t continue to be FAI researchers either way?). It seems like there would probably be better ways to spend the extra resources if we had them though.
I think that falls under my parenthetical comment in the first paragraph. Understanding what rationality-type skills would make this specific thing go well is obviously useful, but it would also be great if we had a general understanding of what rationality-type skills naturally vary together, so that we can use phrases like “more rational” and have a better idea of what they refer to across different contexts.
Maybe? Note that if people like Holden have concerns about whether FAI is too dangerous, that might make them more likely to provide resources toward a separate FAI feasibility team than toward, say, a better FAI team, so it’s not necessarily a fixed heap of resources that we’re distributing.
You’re implying possibility of official separation between the two teams, which seems like a good idea. Between “finding/filtering more rational people” and “social mechanisms” I would vote for mechanisms.
If humans are irrational because of lots of bugs in our brain, it could be hard to measure how many bugs have been fixed or worked around, and how reliable these fixes or workarounds are.
There’s how strong one’s rationality is at its peak, and how strong one’s rationality is on average, and the rarity and badness of lapses in one’s rationality, and how many contexts one’s rationality works in, and to what degree one’s mistakes and insights are independent of others’ mistakes and insights (independence of mistakes means you can correct each other, independence of insights means you don’t duplicate insights). All these measures could vary orthogonally.
Good point. And, depending on your assessment of the risks involved, especially for AGI research, the level of the lapses might be more important than the peak or even the average. A researcher who is perfectly rational (hand waving for the moment about how we measure that) 99% of the time but has, say, fits of rage every so often might be even more dangerous than the slightly less rational on average colleague who is nonetheless stable.
Or, proper mechanism design for the research team might be able to smooth out those troughs and let you use the highest-EV researchers without danger.
You seem to be using rationality to refer to both bug fixes and general intelligence. I’m more concerned about bug fixes myself, for the situation Wei Dai describes. Status-related bugs seem potentially the worst.
I meant to refer to just bug fixes, I think. My comment wasn’t really responsive to yours, just prompted by it, and I should probably have added a note to that effect. One can imagine a set of bugs that become more fixed or less fixed over time, varying together in a continuous manner, depending on e.g. what emotional state one is in. One might be more vulnerable to many bugs when sleepy, for example. One can then talk about averages and extreme values of such a “general rationality” factor in a typical decision context, and talk about whether there are important non-standard contexts where new bugs become important that one hasn’t prepared for. I agree that bugs related to status (and to interpersonal conflict) seem particularly dangerous.