I’m a researcher on the technical governance team at MIRI.
Views expressed are my own, and should not be taken to represent official MIRI positions. Similarly, views within the technical governance team do vary.
Previously:
Helped with MATS, running the technical side of the London extension (pre-LISA).
Worked for a while on Debate (this kind of thing).
Quick takes on the above:
I think MATS is great-for-what-it-is. My misgivings relate to high-level direction.
Worth noting that PIBBSS exists, and is philosophically closer to my ideal.
The technical AISF course doesn’t have the emphasis I’d choose (which would be closer to Key Phenomena in AI Risk). It’s a decent survey of current activity, but only implicitly gets at fundamentals—mostly through a [notice what current approaches miss, and will continue to miss] mechanism.
I don’t expect research on Debate, or scalable oversight more generally, to help significantly in reducing AI x-risk. (I may be wrong! - some elaboration in this comment thread)
The main case for optimism on human-human alignment under extreme optimization seems to be indirection: not that [what I want] and [what you want] happen to be sufficiently similar, but that there’s a [what you want] pointer within [what I want].
Value fragility doesn’t argue strongly against the pointer-based version. The tails don’t come apart when they’re tied together.
It’s not obvious that the values-on-reflection of an individual human would robustly maintain the necessary pointers (to other humans, to past selves, to alternative selves/others...), but it is at least plausible—if you pick the right human.
More generally, an argument along the lines of [the default outcome with AI doesn’t look too different from the default outcome without AI, for most people] suggests that we need to do better than the default, with or without AI. (I’m not particularly optimistic about human-human alignment without serious and principled efforts)