I really appreciated this—it felt like better access to the ‘MIRI strategic viewpoint’ than I think I’ve had in the past.
I found it more clarifying than updating for me, with a couple of tangible exceptions:
First, I found that I really like the example of response to coronavirus as an example of trillion-dollar warning shots. I think I’ve also previously agreed that responses to past disasters have been better, but the more recent example should be (all else equal) more informative to the other ones.
Second, this point about factored cognition
Paul Christiano occasionally floats proposals of (what looks to me like) deferential cognitive systems that are too incapable to be scary, being composed into highly capable cognitive systems that inherit a deference property from their parts. (Paul might not endorse this gloss.) I basically expect the cognition to not compose to something capable, and insofar as it does I basically expect it not to inherit the deference property, and so I have little optimism for such approaches. But it’s possible that Joe does, and that as such, the second bullet point above is doing a bunch of work for him that it’s not doing for me.
I think this is pretty crux-y for me, and I wonder if it’s crux-y for MIRI. This feels very close to the heart of one of my research questions, and if there were strong cases for this, I’d like to hear them.
(My research is less about the whole inheriting deference from the parts, but instead inheriting transparency/interpretability—I expect them to basically be the same with regards to this non-combination)
I really appreciated this—it felt like better access to the ‘MIRI strategic viewpoint’ than I think I’ve had in the past.
I found it more clarifying than updating for me, with a couple of tangible exceptions:
First, I found that I really like the example of response to coronavirus as an example of trillion-dollar warning shots. I think I’ve also previously agreed that responses to past disasters have been better, but the more recent example should be (all else equal) more informative to the other ones.
Second, this point about factored cognition
I think this is pretty crux-y for me, and I wonder if it’s crux-y for MIRI. This feels very close to the heart of one of my research questions, and if there were strong cases for this, I’d like to hear them.
(My research is less about the whole inheriting deference from the parts, but instead inheriting transparency/interpretability—I expect them to basically be the same with regards to this non-combination)