Thanks. I wasn’t super satisfied with the way I phrased my questions. I just made some slight edits to them (labeled as such), although they still don’t feel like they quite do the thing. (I feel like I’m looking at a bunch of subtle frame disconnects, while multiple other frame disconnects are going on, so pinpointing the thing is hard_
I think “is any of this actually cruxy” is maybe the most important question and I should have included it. You answered “not supermuch, at least compared to models of intelligence”. Do you think there’s any similar nearby thing that feels more relevant on your end?
In any case, thanks for your answers, they do help give me more a sense of the gestalt of your worldview here, however relevant it is.
It’s definitely cruxy in the sense that changing my opinions on any of these would shift my p(doom) some amount.
My rough model is that there’s an unknown quantity about reality which is roughly “how strong does the oversight process have to be before the trained model does what the oversight process intended for it to do”. p(doom) mainly depends on whether the actors training the powerful systems have sufficiently powerful oversight processes. This seems primarily affected by the quality of technical alignment solutions, but certainly civilizational adequacy also affects the answer.
Thanks. I wasn’t super satisfied with the way I phrased my questions. I just made some slight edits to them (labeled as such), although they still don’t feel like they quite do the thing. (I feel like I’m looking at a bunch of subtle frame disconnects, while multiple other frame disconnects are going on, so pinpointing the thing is hard_
I think “is any of this actually cruxy” is maybe the most important question and I should have included it. You answered “not supermuch, at least compared to models of intelligence”. Do you think there’s any similar nearby thing that feels more relevant on your end?
In any case, thanks for your answers, they do help give me more a sense of the gestalt of your worldview here, however relevant it is.
It’s definitely cruxy in the sense that changing my opinions on any of these would shift my p(doom) some amount.
My rough model is that there’s an unknown quantity about reality which is roughly “how strong does the oversight process have to be before the trained model does what the oversight process intended for it to do”. p(doom) mainly depends on whether the actors training the powerful systems have sufficiently powerful oversight processes. This seems primarily affected by the quality of technical alignment solutions, but certainly civilizational adequacy also affects the answer.