To say a little more, I think the general approach I lay out in here for taking towards safety work is worth considering more deeply and points towards a better process for choosing interventions in attempts to build aligned AI. I think what’s more important than the specific examples where I apply the method is the method itself, but thus far as best I can tell folks did not much engage with that, so unclear to me if that’s because they disagree, think it’s too obvious, or what.
I wrote this post as a summary of a paper I published. It didn’t get much attention, so I’d be interesting in having you all review it.
https://www.lesswrong.com/posts/JYdGCrD55FhS4iHvY/robustness-to-fundamental-uncertainty-in-agi-alignment-1
To say a little more, I think the general approach I lay out in here for taking towards safety work is worth considering more deeply and points towards a better process for choosing interventions in attempts to build aligned AI. I think what’s more important than the specific examples where I apply the method is the method itself, but thus far as best I can tell folks did not much engage with that, so unclear to me if that’s because they disagree, think it’s too obvious, or what.
Thanks for the suggestion! It’s great to have some methodological posts!
We’ll consider it. :)