tl-dr: people change their minds, reasons why things happen are complex, we should adopt a forgiving mindset/align AI and long-term impact is hard to measure. At the bottom I try to put numbers on EleutherAI’s impact and find it was plausibly net positive.
I don’t think discussing whether someone really wants to do good or whether there is some (possibly unconscious?) status-optimization process is going to help us align AI.
The situation is often mixed for a lot of people, and it evolves over time. The culture we need to have on here to solve AI existential risk need to be more forgiving. Imagine there’s a ML professor who has been publishing papers advancing the state of the art for 20 years who suddenly goes “Oh, actually alignment seems important, I changed my mind”, would you write a LW post condemning them and another lengthy comment about their status-seeking behavior in trying to publish papers just to become a better professor?
I have recently talked to some OpenAI employee who met Connor something like three years ago, when the whole “reproducing GPT-2” thing came about. And he mostly remembered things like the model not having been benchmarked carefully enough. Sure, it did not perform nearly as good on a lot of metrics, though that’s kind of missing the point of how this actually happened? As Connor explains, he did not know this would go anywhere, and spent like 2 weeks working on, without lots of DL experience. He ended up being convinced by some MIRI people to not release it, since this would be establishing a “bad precedent”.
I like to think that people can start with a wrong model of what is good and then update in the right direction. Yes, starting yet another “open-sourcing GPT-3” endeavor the next year is not evidence of having completely updated towards “let’s minimize the risk of advancing capabilities research at all cost”, though I do think that some fraction of people at EleutherAI truly care about alignment and just did not think that the marginal impact of “GPT-Neo/-J accelerating AI timelines” justified not publishing them at all.
My model for what happened for the EleutherAI story is mostly the ones of “when all you have is a hammer everything looks like a nail”. Like, you’ve reproduced GPT-2 and you have access to lots of compute, why not try out GPT-3? And that’s fine. Like, who knew that the thing would become a Discord server with thousands of people talking about ML? That they would somewhat succeed? And then, when the thing is pretty much already somewhat on the rails, what choice do you even have? Delete the server? Tell the people who have been working hard for months to open-source GPT-3 like models that “we should not publish it after all”? Sure, that would have minimized the risk of accelerating timelines. Though when trying to put number on it below I find that it’s not just “stop something clearly net negative”, it’s much more nuanced than that.
And after talking to one of the guys who worked on GPT-J for hours, talking to Connor for 3h, and then having to replay what he said multiple times while editing the video/audio etc., I kind of have a clearer sense of where they’re coming from. I think a more productive way of making progress in the future is to look at what the positive and negative were, and put numbers on what was plausibly net good and plausible net bad, so we can focus on doing the good things in the future and maximize EV (not just minimize risk of negative!).
To be clear, I started the interview with a lot of questions about the impact of EleutherAI, and right now I have a lot more positive or mixed evidence for why it was not “certainly a net negative” (not saying it was certainly net positive). Here is my estimate of the impact of EleutherAI, where I try to measure things in my 80% likelihood interval for positive impact for aligning AI, where the unit is “-1″ for the negative impact of publishing the GPT-3 paper. eg. (-2, −1) means: “a 80% change that impact was between 2x GPT-3 papers and 1x GPT-3 paper”.
Mostly Negative —Publishing the Pile: (-0.4, −0.1) (AI labs, including top ones, use the Pile to train their models) -- Making ML researchers more interested in scaling: (-0.1, −0.025) (GPT-3 spread the scaling meme, not EleutherAI) -- The potential harm that might arise from the next models that might be open-sourced in the future using the current infrastructure: (-1, −0.1) (it does seem that they’re open to open-sourcing more stuff, although plausibly more careful)
Mixed —Publishing GPT-J: (-0.4, 0.2) (easier to finetune than GPT-Neo, some people use it, though admittedly it was not SoTA when it was released. Top AI labs had supposedly better models. Interpretability / Alignment people, like at Redwood, use GPT-J / GPT-Neo models to interpret LLMs)
Mostly Positive —Making ML researchers more interested in alignment: (0.2, 1) (cf. the part when Connor mentions ML professors moving to alignment somewhat because of Eleuther) -- Four of the five core people of EleutherAI changing their career to work on alignment, some of them setting up Conjecture, with tacit knowledge of how these large models work: (0.25, 1) -- Making alignment people more interested in prosaic alignment: (0.1, 0.5) -- Creating a space with a strong rationalist and ML culture where people can talk about scaling and where alignment is high-status and alignment people can talk about what they care about in real-time + scaling / ML people can learn about alignment: (0.35, 0.8)
Averaging these ups I get (if you could just add confidence intervals, I know this is not how probability work) a 80% chance of the impact being in: (-1, 3.275), so plausibly net good.
Like, who knew that the thing would become a Discord server with thousands of people talking about ML? That they would somewhat succeed? And then, when the thing is pretty much already somewhat on the rails, what choice do you even have? Delete the server? Tell the people who have been working hard for months to open-source GPT-3 like models that “we should not publish it after all”?
I think this eloquent quote can serve to depict an important, general class of dynamics that can contribute to anthropogenic x-risks.
I funnily enough ended up retracting the comment around 9 minutes before you posted yours, triggered by this thread and the concerns you outlined about this sort of psychologizing being unproductive. I basically agree with your response.
I don’t think discussing whether someone really wants to do good or whether there is some (possibly unconscious?) status-optimization process is going to help us align AI.
Two comments:
[wanting to do good] vs. [one’s behavior being affected by an unconscious optimization for status/power] is a false dichotomy.
Don’t you think that unilateral interventions within the EA/AIS communities to create/fund for-profit AGI companies, or to develop/disseminate AI capabilities, could have a negative impact on humanity’s ability to avoid existential catastrophes from AI?
First point: by “really want to do good” (the really is important here) I mean someone who would be fundamentally altruistic and would not have any status/power desire, even subconsciously.
I don’t think Conjecture is an “AGI company”, everyone I’ve met there cares deeply about alignment and their alignment team is a decent fraction of the entire company. Plus they’re funding the incubator.
I think it’s also a misconception that it’s an unilateralist intervension. Like, they’ve talked to other people in the community before starting it, it was not a secret.
First point: by “really want to do good” (the really is important here) I mean someone who would be fundamentally altruistic and would not have any status/power desire, even subconsciously.
Then I’d argue the dichotomy is vacuously true, i.e. it does not generally pertain to humans. Humans are the result of human evolution. It’s likely that having a brain that (unconsciously) optimizes for status/power has been very adaptive.
Regarding the rest of your comment, this thread seems relevant.
tl-dr: people change their minds, reasons why things happen are complex, we should adopt a forgiving mindset/align AI and long-term impact is hard to measure. At the bottom I try to put numbers on EleutherAI’s impact and find it was plausibly net positive.
I don’t think discussing whether someone really wants to do good or whether there is some (possibly unconscious?) status-optimization process is going to help us align AI.
The situation is often mixed for a lot of people, and it evolves over time. The culture we need to have on here to solve AI existential risk need to be more forgiving. Imagine there’s a ML professor who has been publishing papers advancing the state of the art for 20 years who suddenly goes “Oh, actually alignment seems important, I changed my mind”, would you write a LW post condemning them and another lengthy comment about their status-seeking behavior in trying to publish papers just to become a better professor?
I have recently talked to some OpenAI employee who met Connor something like three years ago, when the whole “reproducing GPT-2” thing came about. And he mostly remembered things like the model not having been benchmarked carefully enough. Sure, it did not perform nearly as good on a lot of metrics, though that’s kind of missing the point of how this actually happened? As Connor explains, he did not know this would go anywhere, and spent like 2 weeks working on, without lots of DL experience. He ended up being convinced by some MIRI people to not release it, since this would be establishing a “bad precedent”.
I like to think that people can start with a wrong model of what is good and then update in the right direction. Yes, starting yet another “open-sourcing GPT-3” endeavor the next year is not evidence of having completely updated towards “let’s minimize the risk of advancing capabilities research at all cost”, though I do think that some fraction of people at EleutherAI truly care about alignment and just did not think that the marginal impact of “GPT-Neo/-J accelerating AI timelines” justified not publishing them at all.
My model for what happened for the EleutherAI story is mostly the ones of “when all you have is a hammer everything looks like a nail”. Like, you’ve reproduced GPT-2 and you have access to lots of compute, why not try out GPT-3? And that’s fine. Like, who knew that the thing would become a Discord server with thousands of people talking about ML? That they would somewhat succeed? And then, when the thing is pretty much already somewhat on the rails, what choice do you even have? Delete the server? Tell the people who have been working hard for months to open-source GPT-3 like models that “we should not publish it after all”? Sure, that would have minimized the risk of accelerating timelines. Though when trying to put number on it below I find that it’s not just “stop something clearly net negative”, it’s much more nuanced than that.
And after talking to one of the guys who worked on GPT-J for hours, talking to Connor for 3h, and then having to replay what he said multiple times while editing the video/audio etc., I kind of have a clearer sense of where they’re coming from. I think a more productive way of making progress in the future is to look at what the positive and negative were, and put numbers on what was plausibly net good and plausible net bad, so we can focus on doing the good things in the future and maximize EV (not just minimize risk of negative!).
To be clear, I started the interview with a lot of questions about the impact of EleutherAI, and right now I have a lot more positive or mixed evidence for why it was not “certainly a net negative” (not saying it was certainly net positive). Here is my estimate of the impact of EleutherAI, where I try to measure things in my 80% likelihood interval for positive impact for aligning AI, where the unit is “-1″ for the negative impact of publishing the GPT-3 paper. eg. (-2, −1) means: “a 80% change that impact was between 2x GPT-3 papers and 1x GPT-3 paper”.
Mostly Negative
—Publishing the Pile: (-0.4, −0.1) (AI labs, including top ones, use the Pile to train their models)
-- Making ML researchers more interested in scaling: (-0.1, −0.025) (GPT-3 spread the scaling meme, not EleutherAI)
-- The potential harm that might arise from the next models that might be open-sourced in the future using the current infrastructure: (-1, −0.1) (it does seem that they’re open to open-sourcing more stuff, although plausibly more careful)
Mixed
—Publishing GPT-J: (-0.4, 0.2) (easier to finetune than GPT-Neo, some people use it, though admittedly it was not SoTA when it was released. Top AI labs had supposedly better models. Interpretability / Alignment people, like at Redwood, use GPT-J / GPT-Neo models to interpret LLMs)
Mostly Positive
—Making ML researchers more interested in alignment: (0.2, 1) (cf. the part when Connor mentions ML professors moving to alignment somewhat because of Eleuther)
-- Four of the five core people of EleutherAI changing their career to work on alignment, some of them setting up Conjecture, with tacit knowledge of how these large models work: (0.25, 1)
-- Making alignment people more interested in prosaic alignment: (0.1, 0.5)
-- Creating a space with a strong rationalist and ML culture where people can talk about scaling and where alignment is high-status and alignment people can talk about what they care about in real-time + scaling / ML people can learn about alignment: (0.35, 0.8)
Averaging these ups I get (if you could just add confidence intervals, I know this is not how probability work) a 80% chance of the impact being in: (-1, 3.275), so plausibly net good.
I think this eloquent quote can serve to depict an important, general class of dynamics that can contribute to anthropogenic x-risks.
I funnily enough ended up retracting the comment around 9 minutes before you posted yours, triggered by this thread and the concerns you outlined about this sort of psychologizing being unproductive. I basically agree with your response.
Two comments:
[wanting to do good] vs. [one’s behavior being affected by an unconscious optimization for status/power] is a false dichotomy.
Don’t you think that unilateral interventions within the EA/AIS communities to create/fund for-profit AGI companies, or to develop/disseminate AI capabilities, could have a negative impact on humanity’s ability to avoid existential catastrophes from AI?
First point: by “really want to do good” (the really is important here) I mean someone who would be fundamentally altruistic and would not have any status/power desire, even subconsciously.
I don’t think Conjecture is an “AGI company”, everyone I’ve met there cares deeply about alignment and their alignment team is a decent fraction of the entire company. Plus they’re funding the incubator.
I think it’s also a misconception that it’s an unilateralist intervension. Like, they’ve talked to other people in the community before starting it, it was not a secret.
Then I’d argue the dichotomy is vacuously true, i.e. it does not generally pertain to humans. Humans are the result of human evolution. It’s likely that having a brain that (unconsciously) optimizes for status/power has been very adaptive.
Regarding the rest of your comment, this thread seems relevant.