I think experts’ opinions on the possibility of AI self-improvement may covary with their awareness of work on formal, machine-representable concepts of optimal AI design, particularly Solomonoff induction, including its application to reinforcement learning as in AIXI, and variations of Levin search such as Hutter’s algorithm M and Gödel machines. If an expert is unaware of those concepts, this unawareness may serve to explain away the expert’s belief that there are no approaches to engineering self-improvement-capable AI on any foreseeable horizon.
If it’s not too late, you should probably include a question to judge the expert’s awareness of these concepts in your questionnaires, such as:
“Qn: Are you familiar with formal concepts of optimal AI design which relate to searches over complete spaces of computable hypotheses or computational strategies, such as Solomonoff induction, Levin search, Hutter’s algorithm M, AIXI, or Gödel machines?”
...bearing in mind that the presence of such a question may affect their other answers.
(This was part of what I was getting at with my analysis of the AAAI panel interim report: “What cached models of the planning abilities of future machine intelligences did the academics have available [...]?” “What fraction of the academics are aware of any current published AI architectures which could reliably reason over plans at the level of abstraction of ‘implement a proxy intelligence’?”)
Other errors which might explain away an expert’s unconcern for AI risk are:
when considering AI self-improvement scenarios, incautious thinking about parameter uncertainty and structural uncertainty in economic descriptions of computational complexity costs and efficiency gains over time (particularly given that a general AI will be motivated to investigate many different possible structures for the process for self-improvement, including structures one may not oneself have considered, in order to choose a process whose economics are as favorable as possible); and
incomplete reasoning about options for gathering information about technical factors affecting AI risk scenarios, when considering the potential relative costs of delaying AI safety projects until better information is available (on the implicit expectation that, in the event that the technical factors turn out to imply safety, delaying will have prevented the cost of the AI safety projects, and (more viscerally) that having advocated delay will prevent one’s own loss of prestige, unthinkingly taken as a proxy for correctness, whereas failure to have advocated an immediate start to AI safety projects could not result in loss of one’s own prestige in any event).
However, it’s harder to find uncontroversial questions which would be diagnostic of these errors.
However, it’s harder to find uncontroversial questions which would be diagnostic of these errors.
Perhaps an expert’s beliefs about the costs of better information and the costs of delay might be assessed with a willingness-to-pay question, such as a tradeoff involving a hypothetical benefit to everyone now living on Earth which could be sacrificed to gain hypothetical perfect understanding of some technical unknowns related to AI risks, or a hypothetical benefit gained at the cost of perfect future helplessness against AI risks. However, even this sort of question might seem to frame things hyperbolically.
I think experts’ opinions on the possibility of AI self-improvement may covary with their awareness of work on formal, machine-representable concepts of optimal AI design, particularly Solomonoff induction, including its application to reinforcement learning as in AIXI, and variations of Levin search such as Hutter’s algorithm M and Gödel machines. If an expert is unaware of those concepts, this unawareness may serve to explain away the expert’s belief that there are no approaches to engineering self-improvement-capable AI on any foreseeable horizon.
If it’s not too late, you should probably include a question to judge the expert’s awareness of these concepts in your questionnaires, such as:
“Qn: Are you familiar with formal concepts of optimal AI design which relate to searches over complete spaces of computable hypotheses or computational strategies, such as Solomonoff induction, Levin search, Hutter’s algorithm M, AIXI, or Gödel machines?”
...bearing in mind that the presence of such a question may affect their other answers.
(This was part of what I was getting at with my analysis of the AAAI panel interim report: “What cached models of the planning abilities of future machine intelligences did the academics have available [...]?” “What fraction of the academics are aware of any current published AI architectures which could reliably reason over plans at the level of abstraction of ‘implement a proxy intelligence’?”)
Other errors which might explain away an expert’s unconcern for AI risk are:
incautious thinking about the full implications of a given optimization criterion or motivational system;
when considering AI self-improvement scenarios, incautious thinking about parameter uncertainty and structural uncertainty in economic descriptions of computational complexity costs and efficiency gains over time (particularly given that a general AI will be motivated to investigate many different possible structures for the process for self-improvement, including structures one may not oneself have considered, in order to choose a process whose economics are as favorable as possible); and
incomplete reasoning about options for gathering information about technical factors affecting AI risk scenarios, when considering the potential relative costs of delaying AI safety projects until better information is available (on the implicit expectation that, in the event that the technical factors turn out to imply safety, delaying will have prevented the cost of the AI safety projects, and (more viscerally) that having advocated delay will prevent one’s own loss of prestige, unthinkingly taken as a proxy for correctness, whereas failure to have advocated an immediate start to AI safety projects could not result in loss of one’s own prestige in any event).
However, it’s harder to find uncontroversial questions which would be diagnostic of these errors.
Perhaps an expert’s beliefs about the costs of better information and the costs of delay might be assessed with a willingness-to-pay question, such as a tradeoff involving a hypothetical benefit to everyone now living on Earth which could be sacrificed to gain hypothetical perfect understanding of some technical unknowns related to AI risks, or a hypothetical benefit gained at the cost of perfect future helplessness against AI risks. However, even this sort of question might seem to frame things hyperbolically.