I’m a researcher on the technical governance team at MIRI.
Views expressed are my own, and should not be taken to represent official MIRI positions. Similarly, views within the technical governance team do vary.
Previously:
Helped with MATS, running the technical side of the London extension (pre-LISA).
Worked for a while on Debate (this kind of thing).
Quick takes on the above:
I think MATS is great-for-what-it-is. My misgivings relate to high-level direction.
Worth noting that PIBBSS exists, and is philosophically closer to my ideal.
The technical AISF course doesn’t have the emphasis I’d choose (which would be closer to Key Phenomena in AI Risk). It’s a decent survey of current activity, but only implicitly gets at fundamentals—mostly through a [notice what current approaches miss, and will continue to miss] mechanism.
I don’t expect research on Debate, or scalable oversight more generally, to help significantly in reducing AI x-risk. (I may be wrong! - some elaboration in this comment thread)
Some thoughts:
The correct answer is clearly (c) - it depends on a bunch of factors.
My current guess is that it would make things worse (given likely values for the bunch of other factors) - basically for Richard’s reasons.
Given [new potential-to-shift-motivation information/understanding], I expect there’s a much higher chance that this substantially changes the direction of a not-yet-formed project, than a project already in motion.
Specifically:
Who gets picked to run such a project? If it’s primarily a [let’s beat China!] project, are the key people cautious and highly adaptable when it comes to top-level goals? Do they appoint deputies who’re cautious and highly adaptable?
Here I note that the kind of ‘caution’ we’d need is [people who push effectively for the system to operate with caution]. Most people who want caution are more cautious.
How is the project structured? Will the structure be optimized for adaptability? For red-teaming of top-level goals?
Suppose that a mid-to-high-level participant receives information making the current top-level goals questionable—is the setup likely to reward them for pushing for changes? (noting that these are the kind of changes that were not expected to be needed when the project launched)
Which external advisors do leaders of the project develop relationships with? What would trigger these to change?
...
I do think that it makes sense to aim for some centralized project—but only if it’s the right kind.
I expect that almost all the directional influence is in [influence the initial conditions].
For this reason, I expect [push for some kind of centralized project, and hope it changes later] is a bad idea.
I think [devote great effort to influencing the likely initial direction of any such future project] seems a great idea (so long as you’re sufficiently enlightened about desirable initial directions, of course :))
I’d note that [initial conditions] needn’t only be internal to the project—in principle we could have reason to believe that various external mechanisms would be likely to shift the project’s motivation sufficiently over time. (I don’t know of any such reasons)
I think the question becomes significantly harder once the primary motivation behind a project isn’t [let’s beat China!], but also isn’t [your ideal project motivation (with your ideal initial conditions)].
I note that my p(doom) doesn’t change much if we eliminate racing but don’t slow down until it’s clear to most decision makers that it’s necessary.
Likewise, I don’t expect that [focus on avoiding the earliest disasters] is likely to be the best strategy. So e.g. getting into a good position on security seems great, all else equal—but I wouldn’t sacrifice much in terms of [odds of getting to a sufficiently cautious overall strategy] to achieve better short-term security outcomes.