Someone I know who works at Anthropic, not on alignment, has thought pretty hard about this and concluded it was better than alternatives. Some factors include
by working on capabilities, you free up others for alignment work who were previously doing capabilities but would prefer alignment
more competition on product decreases aggregate profits of scaling labs
At one point some kind of post was planned but I’m not sure if this is still happening.
I also think there are significant upskilling benefits to working on capabilities, though I believe this less than I did the other day.
Thanks for your comment Thomas! I appreciate the effort. I have some questions:
by working on capabilities, you free up others for alignment work who were previously doing capabilities but would prefer alignment
I am a little confused by this, would you mind spelling it out for me? Imagine “Steve” took a job at “FakeLab” in capabilities. Are you saying Steve making this decision creates a Safety job for “Jane” at “FakeLab”, that otherwise wouldn’t have existed?
more competition on product decreases aggregate profits of scaling labs
Again I am a bit confused. You’re suggesting that if, for e.g., General Motors announced tomorrow they were investing $20 billion to start an AGI lab, that would be a good thing?
Jane at FakeLab has a background in interpretability but is currently wrangling data / writing internal tooling / doing some product thing because the company needs her to, because otherwise FakeLab would have no product and be unable to continue operating including its safety research. Steve has comparative advantage at Jane’s current job.
It seems net bad because the good effect of slowing down OpenAI is smaller than the bad effect of GM racing? But OpenAI is probably slowed down—they were already trying to build AGI and they have less money and possibly less talent. Thinking about the net effect is complicated and I don’t have time to do it here. The situation with joining a lab rather than founding one may also be different.
Someone I know who works at Anthropic, not on alignment, has thought pretty hard about this and concluded it was better than alternatives. Some factors include
by working on capabilities, you free up others for alignment work who were previously doing capabilities but would prefer alignment
more competition on product decreases aggregate profits of scaling labs
At one point some kind of post was planned but I’m not sure if this is still happening.
I also think there are significant upskilling benefits to working on capabilities, though I believe this less than I did the other day.
Thanks for your comment Thomas! I appreciate the effort. I have some questions:
by working on capabilities, you free up others for alignment work who were previously doing capabilities but would prefer alignment
I am a little confused by this, would you mind spelling it out for me? Imagine “Steve” took a job at “FakeLab” in capabilities. Are you saying Steve making this decision creates a Safety job for “Jane” at “FakeLab”, that otherwise wouldn’t have existed?
more competition on product decreases aggregate profits of scaling labs
Again I am a bit confused. You’re suggesting that if, for e.g., General Motors announced tomorrow they were investing $20 billion to start an AGI lab, that would be a good thing?
Jane at FakeLab has a background in interpretability but is currently wrangling data / writing internal tooling / doing some product thing because the company needs her to, because otherwise FakeLab would have no product and be unable to continue operating including its safety research. Steve has comparative advantage at Jane’s current job.
It seems net bad because the good effect of slowing down OpenAI is smaller than the bad effect of GM racing? But OpenAI is probably slowed down—they were already trying to build AGI and they have less money and possibly less talent. Thinking about the net effect is complicated and I don’t have time to do it here. The situation with joining a lab rather than founding one may also be different.