Reflecting on this more, I wrote in a discord server (then edited to post here):
I wasn’t aware the concept of pivotal acts was entangled with the frame of formal inner+outer alignment as the only (or only feasible?) way to cause safe ASI.
I suspect that by default, I and someone operating in that frame might mutually believe each others agendas to be probably-doomed. This could make discussion more valuable (as in that case, at least one of us should make a large update).
For anyone interested in trying that discussion, I’d be curious what you think of the post linked above. As a comment on it says:
I found myself coming back to this now, years later, and feeling like it is massively underrated. Idk, it seems like the concept of training stories is great and much better than e.g. “we have to solve inner alignment and also outer alignment” or “we just have to make sure it isn’t scheming.”
In my view, solving formal inner alignment, i.e. devising a general method to create ASI with any specified output-selection policy, is hard enough that I don’t expect it to be done.[1] This is why I’ve been focusing on other approaches which I believe are more likely to succeed.
Though I encourage anyone who understands the problem and thinks they can solve it to try to prove me wrong! I can sure see some directions and I think a very creative human could solve it in principle. But I also think a very creative human might find a different class of solution that can be achieved sooner. (Like I’ve been trying to do :)
Reflecting on this more, I wrote in a discord server (then edited to post here):
I wasn’t aware the concept of pivotal acts was entangled with the frame of formal inner+outer alignment as the only (or only feasible?) way to cause safe ASI.
I suspect that by default, I and someone operating in that frame might mutually believe each others agendas to be probably-doomed. This could make discussion more valuable (as in that case, at least one of us should make a large update).
For anyone interested in trying that discussion, I’d be curious what you think of the post linked above. As a comment on it says:
In my view, solving formal inner alignment, i.e. devising a general method to create ASI with any specified output-selection policy, is hard enough that I don’t expect it to be done.[1] This is why I’ve been focusing on other approaches which I believe are more likely to succeed.
Though I encourage anyone who understands the problem and thinks they can solve it to try to prove me wrong! I can sure see some directions and I think a very creative human could solve it in principle. But I also think a very creative human might find a different class of solution that can be achieved sooner. (Like I’ve been trying to do :)