I can’t help with the object level determination, but I think you may be overrating both the balance and import of the second-order evidence.
As far as I can tell, Yudkowsky is a (?dramatically) pessimistic outlier among the class of “rationalist/rationalist-adjacent” SMEs in AI safety, and probably even more so relative to aggregate opinion without an LW-y filter applied (cf.). My impression of the epistemic track-record is Yudkowsky has a tendency of staking out positions (both within and without AI) with striking levels of confidence but not commensurately-striking levels of accuracy.
In essence, I doubt there’s much epistemic reason to defer to Yudkowsky more (or much more) than folks like Carl Shulman, or Paul Christiano, nor maybe much more than “a random AI alignment researcher” or “a superforecaster making a guess after watching a few Rob Miles videos” (although these have a few implied premises around difficulty curves/ subject matter expertise being relatively uncorrelated to judgemental accuracy).
I suggest ~all reasonable attempts at idealised aggregate wouldn’t take a hand-brake turn to extreme pessimism on finding Yudkowsky is. My impression is the plurality LW view has shifted more from “pretty worried” to “pessimistic” (e.g. p(screwed) > 0.4) rather than agreement with Yudkowsky, but in any case I’d attribute large shifts in this aggregate mostly to Yudkowsky’s cultural influence on the LW-community plus some degree of internet cabin fever (and selection) distorting collective judgement.
None of this is cause for complacency: even if p(screwed) isn’t ~1, > 0.1 (or 0.001) is ample cause for concern, and resolution on values between (say) [0.1 0.9] is informative for many things (like personal career choice). I’m not sure whether you get more yield for marginal effort on object or second-order uncertainty (e.g. my impression is the ‘LW cluster’ trends towards pessimism, so adjudicating whether this cluster should be over/under weighted could be more informative than trying to get up to speed on ELK). I would guess, though, that whatever distils out of LW discourse in 1-2 months will be much more useful than what you’d get right now.
I see the concerns as these:
The four corners of the agreement seem to define ‘disparagement’ broadly, so one might reasonably fear (e.g.) “First author on an eval especially critical of OpenAI versus its competitors”, or “Policy document highly critical of OpenAI leadership decisions” might ‘count’.
Given Altman’s/OpenAI’s vindictiveness and duplicity, and the previous ‘safeguards’ (from their perspective) which give them all the cards in terms of folks being able to realise the value of their equity, “They will screw me out of a lot of money if I do something they really don’t like (regardless of whether it ‘counts’ per the non-disparagement agreement)” seems a credible fear.
It appears Altman tried to get Toner kicked off the board for being critical of OpenAI in a policy piece, after all.
This is indeed moot for roles which require equity to be surrendered anyway. I’d guess most roles outside government (and maybe some within it) do not have such requirements. A conflict of interest roughly along the lines of the first two points makes impartial performance difficult, and credible impartial performance impossible (i.e. even if indeed Alice can truthfully swear “My being subject to such an agreement has never influenced my work in AI policy”, reasonable third parties would be unwise to believe her).
The ‘non-disclosure of non-disparagement’ makes this worse, as it interferes with this conflict of interest being fully disclosed. “Alice has a bunch of OpenAI equity” is one thing, “Alice has a bunch of OpenAI equity, and has agreed to be beholden to them in various ways to keep it” is another. We would want to know the latter to critically appraise Alice’s work whenever it is relevant to OpenAI’s interests (and I would guess a lot of policy/eval/reg/etc. would be sufficiently relevant that we’d like to contemplate whether Alice’s commitments colour her position). Yet Alice has also promised to keep these extra relevant details secret.