Thanks for writing this, I think this is a common and pretty rough experience.
Have you considered doing cybersecurity work related to AI safety? i.e. work would help prevent bad actors stealing model weights and AIs themselves escaping. I think this kind of work would likely be more useful than most alignment work.
I think this kind of work is probably extremely useful, and somewhat neglected, it especially seems to be missing people who know about cybersecurity and care about AGI/alignment.
Working on this seems good insofar as greater control implies more options. With good security, it’s still possible to opt in to whatever weight-sharing / transparency mechanisms seem net positive—including with adversaries. Without security there’s no option.
Granted, the [more options are likely better] conclusion is clearer if we condition on wise strategy. However, [we have great security, therefore we’re sharing nothing with adversaries] is clearly not a valid inference in general.
Not necessarily. If we have the option to hide information, then even if we reveal information, adversaries may still assume (likely correctly) we aren’t sharing all our information, and are closer to a decisive strategic advantage than we appear. Even in the case where we do share all our information (which we won’t).
Of course the more options are likely better option holds if the lumbering, slow, disorganized, and collectively stupid organizations which have those options somehow perform the best strategy, but they’re not actually going to take the best strategy. Especially when it comes to US-China relations.
ETA:
[we have great security, therefore we’re sharing nothing with adversaries] is clearly not a valid inference in general.
I don’t think the conclusion holds if that is true in general, and I don’t think I ever assumed or argued it was true in general.
then even if we reveal information, adversaries may still assume (likely correctly) we aren’t sharing all our information
I think the same reasoning applies if they hack us: they’ll assume that the stuff they were able to hack was the part we left suspiciously vulnerable, and the really important information is behind more serious security.
I expect they’ll assume we’re in control either way—once the stakes are really high. It seems preferable to actually be in control.
I’ll grant that it’s far from clear that the best strategy would be used.
(apologies if I misinterpreted your assumptions in my previous reply)
Thanks for writing this, I think this is a common and pretty rough experience.
Have you considered doing cybersecurity work related to AI safety? i.e. work would help prevent bad actors stealing model weights and AIs themselves escaping. I think this kind of work would likely be more useful than most alignment work.
I’d recommend reading Holden Karnofsky’s takes, as well as the recent huge RAND report on securing model weights. Redwood’s control agenda might also be relevant.
I think this kind of work is probably extremely useful, and somewhat neglected, it especially seems to be missing people who know about cybersecurity and care about AGI/alignment.
I note that I am uncertain whether working on such a task would increase or decrease global stability & great power conflicts.
Working on this seems good insofar as greater control implies more options. With good security, it’s still possible to opt in to whatever weight-sharing / transparency mechanisms seem net positive—including with adversaries. Without security there’s no option.
Granted, the [more options are likely better] conclusion is clearer if we condition on wise strategy.
However, [we have great security, therefore we’re sharing nothing with adversaries] is clearly not a valid inference in general.
Not necessarily. If we have the option to hide information, then even if we reveal information, adversaries may still assume (likely correctly) we aren’t sharing all our information, and are closer to a decisive strategic advantage than we appear. Even in the case where we do share all our information (which we won’t).
Of course the more options are likely better option holds if the lumbering, slow, disorganized, and collectively stupid organizations which have those options somehow perform the best strategy, but they’re not actually going to take the best strategy. Especially when it comes to US-China relations.
ETA:
I don’t think the conclusion holds if that is true in general, and I don’t think I ever assumed or argued it was true in general.
I think the same reasoning applies if they hack us: they’ll assume that the stuff they were able to hack was the part we left suspiciously vulnerable, and the really important information is behind more serious security.
I expect they’ll assume we’re in control either way—once the stakes are really high.
It seems preferable to actually be in control.
I’ll grant that it’s far from clear that the best strategy would be used.
(apologies if I misinterpreted your assumptions in my previous reply)