At the moment, I am particularly interested in the structure of proposed x-risk models (more than the specific conclusions). Lately, there has been a lot of attention on Carlsmith-style decompositions, which have the form “Catastrophe occurs if this conjunction of events occur”. I found it interesting that this post took the upside-down version of that, i.e., “Catastrophe is inevitable unless (one of) these things happen”.
Why do I find this distinction relevant? Consider how non-informed our assessments for most of these factors in these models actually are. Once you include this meta-uncertainty, Jensen’s inequality implies that the mean & median risk of catastrophe decreases with greater meta-uncertainty, whereas when you turn the argument upside down, it also inverts Jensen’s inequality, such that mean & median risk of catastrophe would increase with greater meta-uncertainty.
This is a complex topic that has more nuances than are warranted in a comment. I’ll mention that Michael’s actual argument uses “unless **one of** these things happen”, which is additive (thus not subject to the Jensen’s inequality phenomena). But because the model is structured in this polarity, Jensen’s would kick in as soon as he introduces a conjunction of factors, which I see as something that would naturally occur with this style of model structure. Even though this post’s is additive, I give this article credit for triggering this insight for me.
Also, thank you for your Appendices A & B that describe your opinions about which approaches to alignment you see as promising and non-promising.
At the moment, I am particularly interested in the structure of proposed x-risk models (more than the specific conclusions). Lately, there has been a lot of attention on Carlsmith-style decompositions, which have the form “Catastrophe occurs if this conjunction of events occur”. I found it interesting that this post took the upside-down version of that, i.e., “Catastrophe is inevitable unless (one of) these things happen”.
Why do I find this distinction relevant? Consider how non-informed our assessments for most of these factors in these models actually are. Once you include this meta-uncertainty, Jensen’s inequality implies that the mean & median risk of catastrophe decreases with greater meta-uncertainty, whereas when you turn the argument upside down, it also inverts Jensen’s inequality, such that mean & median risk of catastrophe would increase with greater meta-uncertainty.
This is a complex topic that has more nuances than are warranted in a comment. I’ll mention that Michael’s actual argument uses “unless **one of** these things happen”, which is additive (thus not subject to the Jensen’s inequality phenomena). But because the model is structured in this polarity, Jensen’s would kick in as soon as he introduces a conjunction of factors, which I see as something that would naturally occur with this style of model structure. Even though this post’s is additive, I give this article credit for triggering this insight for me.
Also, thank you for your Appendices A & B that describe your opinions about which approaches to alignment you see as promising and non-promising.
Great comment, this clarified the distinction of these arguments to me. And IMO this (Michael’s) argument is obviously the correct way to look at it.