Really appreciate you sharing your honest thoughts here, Rekrul.
From my side, I’d value actually discussing the reasoning forms and steps we already started to outline on the forum. For example, the relevance of intrinsic vs extrinsic selection and correction, or the relevance of the organic vs. artificial substrate distinction. These distinctions are something I would love to openly chat about with you (not the formal reasoning – I’m the bridge-builder, Forrest is the theorist).
That might feel unsatisfactory – in the sense of “why don’t you just give us the proof now?”
As far as I can tell (Forrest can correct me later), there are at least two key reasons:
There is a tendency amongst AI Safety researchers to want to cut to the chase to judging the believability of the conclusion itself. For example, notice that I tried to clarify several argument parts in comment exchanges with Paul, with little or no response. People tend to believe that this would be the same as judging a maths proof over idealised deterministic and countable spaces. Yet formal reasoning here would have have to reference and build up premises from physical theory in indeterministic settings. So we actually need to clarify how a different form of formal reasoning is required here, that does not look like what would be required for P=NP. Patience is needed on the side of our interlocutors.
While Forrest does have most of the argument parts formalised, his use of precise analytical language and premises are not going to be clear to you. Mathematicians are not the only people who use formal language and reasoning steps to prove impossibilities by contradiction. Some analytical philosophers do too (as do formal verification researchers in industrial software engineering using different notation for logic transformation, etc.). No amount of “just give the proof to us and leave it to us to judge” lends us confidence that the judging would track the reasoning steps – if those people already did not track correspondences of some first basic argument parts described by the explanatory writings by Forrest or I that their comments referred to. Even if they are an accomplished mathematician, they are not going to grasp the argumentation if they skim through the text, judging it based on their preconception of what language the terms should be described in or how the formal reasoning should be structured.
I get that people are busy, but this is how it is. We are actually putting a lot of effort and time into communication (and are very happy to get your feedback on that!). And to make this work, they (or others) will need to put in commensurate effort on their end. It is up to them to show that they are not making inconsistent jumps in reasoning there, or talking in terms of their intuitive probability predictions about the believability of the end result, where we should be talking about binary logic transformations.
And actually, such nitty-gritty conversations would be really helpful for us too! Here is what I wrote before in response to another person’s question whether a public proof is available:
Main bottleneck is (re)writing it in a language that AI(S) researchers will understand without having to do a lot of reading/digging in the definitions of terms and descriptions of axioms/premises. A safety impossibility theorem can be constructed from various forms that are either isomorphic with others or are using separate arguments (eg. different theoretical limits covering different scopes of AGI interaction) to arrive at what seems to be an overdetermined conclusion (that long-term AGI safety is not possible).
We don’t want to write it out so long that most/all readers drop out before they get to parse through the key reasoning steps. But we also do not want to make it so brief and dense that researchers are confused about at what level of generality we’re talking about, have to read through other referenced literature to understand definitions, etc.
Also, one person (a grant investigator) has warned us that AI safety researchers would be too motivated against the conclusion (see ‘belief bias’) that few would actually attempt to read through a formal safety impossibility theorem. That’s indeed likely based on my exchanges so far with AIS researchers (many of them past organisers or participants of AISC). So that is basically why we are first writing a condensed summary (for the Alignment Forum and beyond) that orders the main arguments for long-term AGI safety impossibility without precisely describing all axioms and definitions of terms used, covering all the reasoning gaps to ensure logical consistency, etc.
Note: Forrest has a background in analytical philosophy; he does not write in mathematical notation. Another grant investigator we called with had the expectation that the formal reasoning is necessarily written out in mathematical notation (a rough post-call write-up consolidating our impressions and responses to that conversation): https://mflb.com/ai_alignment_1/math_expectations_psr.html
Also note that Forrest’s formal reasoning work got funded by a $170K grant by Survival and Flourishing Fund. So some grant investigators were willing to bet on this work with money.
One thing Paul talks about constantly is how useful it would be if he had some hard evidence a current approach is doomed, as it would allow the community to pivot. A proof of alignment impossibility would probably make him ecstatic if it was correct (even if it puts us in quite a scary position).
I respect this take then by Paul a lot. This is how I also started to think about it a year ago.
Really appreciate you sharing your honest thoughts here, Rekrul.
From my side, I’d value actually discussing the reasoning forms and steps we already started to outline on the forum. For example, the relevance of intrinsic vs extrinsic selection and correction, or the relevance of the organic vs. artificial substrate distinction. These distinctions are something I would love to openly chat about with you (not the formal reasoning – I’m the bridge-builder, Forrest is the theorist).
That might feel unsatisfactory – in the sense of “why don’t you just give us the proof now?”
As far as I can tell (Forrest can correct me later), there are at least two key reasons:
There is a tendency amongst AI Safety researchers to want to cut to the chase to judging the believability of the conclusion itself. For example, notice that I tried to clarify several argument parts in comment exchanges with Paul, with little or no response. People tend to believe that this would be the same as judging a maths proof over idealised deterministic and countable spaces. Yet formal reasoning here would have have to reference and build up premises from physical theory in indeterministic settings. So we actually need to clarify how a different form of formal reasoning is required here, that does not look like what would be required for P=NP. Patience is needed on the side of our interlocutors.
While Forrest does have most of the argument parts formalised, his use of precise analytical language and premises are not going to be clear to you. Mathematicians are not the only people who use formal language and reasoning steps to prove impossibilities by contradiction. Some analytical philosophers do too (as do formal verification researchers in industrial software engineering using different notation for logic transformation, etc.). No amount of “just give the proof to us and leave it to us to judge” lends us confidence that the judging would track the reasoning steps – if those people already did not track correspondences of some first basic argument parts described by the explanatory writings by Forrest or I that their comments referred to. Even if they are an accomplished mathematician, they are not going to grasp the argumentation if they skim through the text, judging it based on their preconception of what language the terms should be described in or how the formal reasoning should be structured.
I get that people are busy, but this is how it is. We are actually putting a lot of effort and time into communication (and are very happy to get your feedback on that!). And to make this work, they (or others) will need to put in commensurate effort on their end. It is up to them to show that they are not making inconsistent jumps in reasoning there, or talking in terms of their intuitive probability predictions about the believability of the end result, where we should be talking about binary logic transformations.
And actually, such nitty-gritty conversations would be really helpful for us too! Here is what I wrote before in response to another person’s question whether a public proof is available:
Also note that Forrest’s formal reasoning work got funded by a $170K grant by Survival and Flourishing Fund. So some grant investigators were willing to bet on this work with money.
I respect this take then by Paul a lot. This is how I also started to think about it a year ago.