some understanding here may be more dangerous than no understanding, precisely because it’s enough to accomplish some things without accomplishing everything that you needed to.
Fwiw, under the worldview I’m outlining, this sounds like a “clever argument” to me, that I would expect on priors to be less likely to be true, regardless of my position on takeoff. (Takeoff does matter, in that I expect that this worldview is not very accurate/good if there’s discontinuous takeoff, but imputing the worldview I don’t think takeoff matters.)
I often think of this as penalizing nth-order effects in proportion to some quickly-growing function of n. (Warning: I’m using the phrase “nth-order effects” in a non-standard, non-technical way.)
Under the worldview I mentioned, the first-order effect of better understanding of AI systems, is that you are more likely to build AI systems that are useful and do what you want.
The second-order effect is “maybe there’s a regime where you can build capable-but-not-safe things; if we’re currently below that, it’s bad to go up into that regime”. This requires a more complicated model of the world (given this worldview) and more assumptions of where we are.
(Also, now that I’ve written this out, the model also predicts there’s no chance of solving alignment, because we’ll first reach the capable-but-not-safe things, and die. Probably the best thing to do on this model is to race ahead on understanding as fast as possible, and hope we leapfrog directly to the capable-and-safe regime? Or you work on understanding AI in secret, and only release once you know how to do capable-and-safe, so that no one has the chance to work on capable-but-not-safe? You can see why this argument feels a bit off under the worldview I outlined.)
Takeoff does matter, in that I expect that this worldview is not very accurate/good if there’s discontinuous takeoff, but imputing the worldview I don’t think takeoff matters.
Minor question: could you clarify what you mean by “imputing the worldview” here? Do you mean something like, “operating within the worldview”? (I ask because this doesn’t seem to be a use of “impute” that I’m familiar with.)
Do you mean something like, “operating within the worldview”?
Basically yes. Longer version: “Suppose we were in scenario X. Normally, in such a scenario, I would discard this worldview, or put low weight on it, because reason Y. But suppose by fiat that I continue to use the worldview, with no other changes made to scenario X. Then …”
It’s meant to be analogous to imputing a value in a causal Bayes net, where you simply “suppose” that some event happened, and don’t update on anything causally upstream, but only reason forward about things that are causally downstream. (I seem to recall Scott Garrabrant writing a good post on this, but I can’t find it now. ETA: Found it, it’s here, but it doesn’t use the term “impute” at all. I’m now worried that I literally made up the term, and it doesn’t actually have any existing technical meaning.)
Fwiw, under the worldview I’m outlining, this sounds like a “clever argument” to me, that I would expect on priors to be less likely to be true, regardless of my position on takeoff. (Takeoff does matter, in that I expect that this worldview is not very accurate/good if there’s discontinuous takeoff, but imputing the worldview I don’t think takeoff matters.)
I often think of this as penalizing nth-order effects in proportion to some quickly-growing function of n. (Warning: I’m using the phrase “nth-order effects” in a non-standard, non-technical way.)
Under the worldview I mentioned, the first-order effect of better understanding of AI systems, is that you are more likely to build AI systems that are useful and do what you want.
The second-order effect is “maybe there’s a regime where you can build capable-but-not-safe things; if we’re currently below that, it’s bad to go up into that regime”. This requires a more complicated model of the world (given this worldview) and more assumptions of where we are.
(Also, now that I’ve written this out, the model also predicts there’s no chance of solving alignment, because we’ll first reach the capable-but-not-safe things, and die. Probably the best thing to do on this model is to race ahead on understanding as fast as possible, and hope we leapfrog directly to the capable-and-safe regime? Or you work on understanding AI in secret, and only release once you know how to do capable-and-safe, so that no one has the chance to work on capable-but-not-safe? You can see why this argument feels a bit off under the worldview I outlined.)
I was going to write a comment here, but it got a bit long so I made a post instead.
Minor question: could you clarify what you mean by “imputing the worldview” here? Do you mean something like, “operating within the worldview”? (I ask because this doesn’t seem to be a use of “impute” that I’m familiar with.)
Basically yes. Longer version: “Suppose we were in scenario X. Normally, in such a scenario, I would discard this worldview, or put low weight on it, because reason Y. But suppose by fiat that I continue to use the worldview, with no other changes made to scenario X. Then …”
It’s meant to be analogous to imputing a value in a causal Bayes net, where you simply “suppose” that some event happened, and don’t update on anything causally upstream, but only reason forward about things that are causally downstream. (I seem to recall Scott Garrabrant writing a good post on this, but I can’t find it now. ETA: Found it, it’s here, but it doesn’t use the term “impute” at all. I’m now worried that I literally made up the term, and it doesn’t actually have any existing technical meaning.)
Aha! I thought it might be borrowing language from some technical term I wasn’t familiar with. Thanks!