I’m a researcher on the technical governance team at MIRI.
Views expressed are my own, and should not be taken to represent official MIRI positions. Similarly, views within the technical governance team do vary.
Previously:
Helped with MATS, running the technical side of the London extension (pre-LISA).
Worked for a while on Debate (this kind of thing).
Quick takes on the above:
I think MATS is great-for-what-it-is. My misgivings relate to high-level direction.
Worth noting that PIBBSS exists, and is philosophically closer to my ideal.
The technical AISF course doesn’t have the emphasis I’d choose (which would be closer to Key Phenomena in AI Risk). It’s a decent survey of current activity, but only implicitly gets at fundamentals—mostly through a [notice what current approaches miss, and will continue to miss] mechanism.
I don’t expect research on Debate, or scalable oversight more generally, to help significantly in reducing AI x-risk. (I may be wrong! - some elaboration in this comment thread)
I think there’s a decent case that such updating will indeed disincentivize making positive EV bets (in some cases, at least).
In principle we’d want to update on the quality of all past decision-making. That would include both [made an explicit bet by taking some action] and [made an implicit bet through inaction]. With such an approach, decision-makers could be punished/rewarded with the symmetry required to avoid undesirable incentives (mostly).
Even here it’s hard, since there’d always need to be a [gain more influence] mechanism to balance the possibility of losing your influence.
In practice, most of the implicit bets made through inaction go unnoticed—even where they’re high-stakes (arguably especially when they’re high-stakes: most counterfactual value lies in the actions that won’t get done by someone else; you won’t be punished for being late to the party when the party never happens).
That leaves the explicit bets. To look like a good decision-maker the incentive is then to make low-variance explicit positive EV bets, and rely on the fact that most of the high-variance, high-EV opportunities you’re not taking will go unnoticed.
From my by-no-means-fully-informed perspective, the failure mode at OpenPhil in recent years seems not to be [too many explicit bets that don’t turn out well], but rather [too many failures to make unclear bets, so that most EV is left on the table]. I don’t see support for hits-based research. I don’t see serious attempts to shape the incentive landscape to encourage sufficient exploration. It’s not clear that things are structurally set up so anyone at OP has time to do such things well (my impression is that they don’t have time, and that thinking about such things is no-one’s job (?? am I wrong ??)).
It’s not obvious to me whether the OpenAI grant was a bad idea ex-ante. (though probably not something I’d have done)
However, I think that another incentive towards middle-of-the-road, risk-averse grant-making is the last thing OP needs.
That said, I suppose much of the downside might be mitigated by making a distinction between [you wasted a lot of money in ways you can’t legibly justify] and [you funded a process with (clear, ex-ante) high negative impact].
If anyone’s proposing punishing the latter, I’d want it made very clear that this doesn’t imply punishing the former. I expect that the best policies do involve wasting a bunch of money in ways that can’t be legibly justified on the individual-funding-decision level.