Current impressions of free energy in the alignment space.
Outreach to capabilities researchers. I think that getting people who are actually building the AGI to be more cautious about alignment / racing makes a bunch of things like coordination agreements possible, and also increases the operational adequacy of the capabilities lab.
One of the reasons people don’t like this is because historically outreach hasn’t gone well, but I think the reason for this is that mainstream ML people mostly don’t buy “AGI big deal”, whereas lab capabilities researchers buy “AGI big deal” but not “alignment hard”.
I think people at labs running retreats, 1-1s, alignment presentations within labs are all great to do this.
I’m somewhat unsure about this one because of downside risk and also ‘convince people of X’ is fairly uncooperative and bad for everyone’s epistemics.
Conceptual alignment research addressing the hard part of the problem. This is hard and not easy to transition to without a bunch of upskilling, but if the SLT hypothesis is right, there are a bunch of key problems that mostly go unnassailed, and so there’s a bunch of low hanging fruit there.
Strategy research on the other low hanging fruit in the AI safety space. Ideally, the product of this research would be a public quantitative model about what interventions are effective and why. The path to impact here is finding low hanging fruit and pointing them out so that people can do them.
Conceptual alignment research addressing the hard part of the problem. This is hard and not easy to transition to without a bunch of upskilling, but if the SLT hypothesis is right, there are a bunch of key problems that mostly go unnassailed, and so there’s a bunch of low hanging fruit there.
Not all that low-hanging, since Nate is not actually all that vocal about what he means by SLT to anyone but your small group.
Current impressions of free energy in the alignment space.
Outreach to capabilities researchers. I think that getting people who are actually building the AGI to be more cautious about alignment / racing makes a bunch of things like coordination agreements possible, and also increases the operational adequacy of the capabilities lab.
One of the reasons people don’t like this is because historically outreach hasn’t gone well, but I think the reason for this is that mainstream ML people mostly don’t buy “AGI big deal”, whereas lab capabilities researchers buy “AGI big deal” but not “alignment hard”.
I think people at labs running retreats, 1-1s, alignment presentations within labs are all great to do this.
I’m somewhat unsure about this one because of downside risk and also ‘convince people of X’ is fairly uncooperative and bad for everyone’s epistemics.
Conceptual alignment research addressing the hard part of the problem. This is hard and not easy to transition to without a bunch of upskilling, but if the SLT hypothesis is right, there are a bunch of key problems that mostly go unnassailed, and so there’s a bunch of low hanging fruit there.
Strategy research on the other low hanging fruit in the AI safety space. Ideally, the product of this research would be a public quantitative model about what interventions are effective and why. The path to impact here is finding low hanging fruit and pointing them out so that people can do them.
Not all that low-hanging, since Nate is not actually all that vocal about what he means by SLT to anyone but your small group.