Very insightful post. Here are personal thoughts with low epistemic status and high rambling potential:
These all feel to me like corollaries to the belief “AGI is so important that I can’t gauge the value of anything else except in regards to how it affects AGI”. Hence: “everything else is meaningless because AGI will change everything soon” or “nobody around me is looking up at the meteor about to hit us and that makes me feel kind of insane. (*Cough* so I hang out with rationalists, whose entire shtick is learning how not to be insane)”.
As for other non-obvious effects: I personally feel some sort of perceived fragility around the whole field. There are arguments on this site for why AGI alignment should not be discussed in politics or why attempting to convince OpenAI or DeepMind employees to switch jobs can easily backfire (eg this post for caution advice). These make any outreach at all seem risky. There are also people I know wondering whether they should attempt to do anything at allrelative to alignment, because they perceive themselves as probable dead weights. The relatively short timelines, the sheer scope, and the aura of impossibility around alignment seem to make people more cautious than they otherwise should be. Obviously the whole point of the field is to be cautious; but while it’s true that the tried-and-tested scientific method isn’t safe for AGI in general I’m not sure stressing the rationalist-tools solve-problems-before-you-experiment approach is healthy everywhere. So, caution is right there in the description of the field, but you have to make sure you contain it well so that it doesn’t infect places where you would do good to be reckless and use trial-and-error. I am probably quite wrong about this but I don’t see many people talking about it, so if there’s any reasonable doubt we should figure it out.
Alignment work should probably be perceived as less fragile. Unlike the AI field in general, alignment projects specifically don’t pose much of a risk to the world. So we can probably afford to be more loose here than elsewhere. In my experience alignment feels like a pack of delicate butterflies flying together, with every flap of wings sending dozens of comrades spiraling out of the sky, which might or might not set off a domino/Rube Goldberg machine that blows up the world.
Alignment is also perceived as fragile. Almost all paradigms of alignment and AI safety research (interpretability, agent foundations, prosaic alignment, model encryption, etc.) Are often criticised on LW by different people as at best totally ineffectual from the opportunity cost perspective, and at worst downright harmful due to some unforeseen effects or as safety-washing enablers for AGI labs. (I myself am culpable of many such criticisms.)
OTOH, this very work on the strategy and methodology of AI safety development could be reasonably criticised as worsening the psychological state of AI safety researchers and therefore potentially net harmful despite its marginal improvements to strategy and methodology (if these even happen in practice, which is not clear to me).
Very insightful post. Here are personal thoughts with low epistemic status and high rambling potential:
These all feel to me like corollaries to the belief “AGI is so important that I can’t gauge the value of anything else except in regards to how it affects AGI”. Hence: “everything else is meaningless because AGI will change everything soon” or “nobody around me is looking up at the meteor about to hit us and that makes me feel kind of insane. (*Cough* so I hang out with rationalists, whose entire shtick is learning how not to be insane)”.
As for other non-obvious effects: I personally feel some sort of perceived fragility around the whole field. There are arguments on this site for why AGI alignment should not be discussed in politics or why attempting to convince OpenAI or DeepMind employees to switch jobs can easily backfire (eg this post for caution advice). These make any outreach at all seem risky. There are also people I know wondering whether they should attempt to do anything at all relative to alignment, because they perceive themselves as probable dead weights. The relatively short timelines, the sheer scope, and the aura of impossibility around alignment seem to make people more cautious than they otherwise should be. Obviously the whole point of the field is to be cautious; but while it’s true that the tried-and-tested scientific method isn’t safe for AGI in general I’m not sure stressing the rationalist-tools solve-problems-before-you-experiment approach is healthy everywhere. So, caution is right there in the description of the field, but you have to make sure you contain it well so that it doesn’t infect places where you would do good to be reckless and use trial-and-error. I am probably quite wrong about this but I don’t see many people talking about it, so if there’s any reasonable doubt we should figure it out.
Alignment work should probably be perceived as less fragile. Unlike the AI field in general, alignment projects specifically don’t pose much of a risk to the world. So we can probably afford to be more loose here than elsewhere. In my experience alignment feels like a pack of delicate butterflies flying together, with every flap of wings sending dozens of comrades spiraling out of the sky, which might or might not set off a domino/Rube Goldberg machine that blows up the world.
Alignment is also perceived as fragile. Almost all paradigms of alignment and AI safety research (interpretability, agent foundations, prosaic alignment, model encryption, etc.) Are often criticised on LW by different people as at best totally ineffectual from the opportunity cost perspective, and at worst downright harmful due to some unforeseen effects or as safety-washing enablers for AGI labs. (I myself am culpable of many such criticisms.)
OTOH, this very work on the strategy and methodology of AI safety development could be reasonably criticised as worsening the psychological state of AI safety researchers and therefore potentially net harmful despite its marginal improvements to strategy and methodology (if these even happen in practice, which is not clear to me).