Maybe I don’t know enough about OpenAI’s alignment team to criticize it in public? I wanted to name one alignment outfit because I like to be as specific as possible in my writing. OpenAI popped into my head because of the reasons I describe below. I would be interested in your opinion. Maybe you’ll change my mind.
I had severe doubts about the alignment project (the plan of creating an aligned superintelligence before any group manages an unaligned one) even before Eliezer went public with his grave doubts in the fall of last year. It’s not that I consider the project impossible in principle, just that it is of such difficulty that it seems unlikely that we will accomplish it before the appearance of an unaligned intelligence that kills us all. In other words, I see it as humanly possible, but probably not humanly possible quickly enough. Anna Salamon was saying in 2010 or so that alignment research (called Friendliness research or Friendly-AI research in those days IIRC) was like trying to invent differential equations before the rest of the world invents elementary algebra, which is basically the same take unless I misinterpreted her. Since then of course there has been an alarming amount of progress towards inventing the elementary algebra of her analogy.
I have no objection to people’s continuing to work on alignment, and I’d offer Scott Garrabrant as an example of someone doing good work on it, but it seems unlikely that anyone employed by an organization whose main plan for getting money is to sell AI capabilities would be able to sustainably do good work on it: humans are too easily influenced by their workplaces and by the source of their personal economic security. And OpenAI’s main plan for getting money is to sell AI capabilities. (They were a non-profit at their founding, but switched to being a for-profit in 2019.)
Also, at OpenAI’s founding, the main plan, the main strategy proposed for ensuring that AI will turn out well for humanity, was for OpenAI to publish all its research! Sam Altman walked back on that plan a little, but he didn’t change the name of the organization, a name that is a very concise description of the original plan, which is a sign that he doesn’t really get how misguided and (unintentionally) destructive the original plan was. It is a sign because it is not uncommon for an organization to change its name when it undergoes a true change in strategy or approach.
I used to continue to see doctors who would offer what I knew was bad advice. This was my policy for about 28 years. (And during that time I saw many doctors because I have chronic health conditions. I was already an adult at the start of the 28-year interval.) As long as the doctor did not cost me much and was generally willing to order a significant fraction of the tests and prescribe a significant fraction of the drugs I asked him to order and prescribe, I tended to continue to see that doctor. I stopped doing that because I had accumulated a lot of evidence that their bad advice tended to affect my behavior (to change it to conform to the advice) even though I recognized the advice as bad as soon as it was conveyed to me.
You (the reader, not just the person I am replying to, namely, Nisan) might not have my problem remaining uninfluenced by bad advice from authority figures. Maybe I’m more suggestible than you are. Do you know that for sure? If not, why not err on the side of caution? There are many employers in this world! Why not avoid working for any outfit in which a large fraction of the employees are embarked on a project that will eventually probably kill us all and who have significant career capital invested in that project?
Hmm. I know you (Nisan) work or used to work for Google. I notice that I don’t object to that. I notice that I don’t seem to object much to anyone’s working for an outfit that does a lot of capability research if that is the most efficient way for them to provide for themselves or their family. I just don’t like it as a plan for improving the world. If the best plan a person can come up with for improving the world involves working for an outfit that does a lot of capability research, well, I tend to think that that someone should postpone their ambitions to improve the world and focus on becoming stronger (more rational) and making money to provide for themselves and their family until such time as they can think up a better plan!
Also, my non-objection to people’s continuing to work for AI-capability outfits for personal economic reasons applies only to people who have already invested a lot of time and energy in learning to do that kind of work (through learning on the job or learning on one’s own dime): it is a bad idea IMO for anyone not already on the capabilities-research career path to get on it. I know that many here would disagree, but IMO getting good at AI capabilities work, very probably doesn’t help much for AI alignment work. Look for example at the work of Scott Garrabrant (Cartesian frames, finite factored sets). Very rarely if at all does it rely on the capabilities literature.
Thanks for sharing your reasoning. For what it’s worth, I worked on OpenAI’s alignment team for two years and think they do good work :) I can’t speak objectively, but I’d be happy to see talented people continue to join their team.
I think they’re reducing AI x-risk in expectation because of the alignment research they publish (1234). If anyone thinks that research or that kind of research is bad for the world, I’m happy to discuss.
I have a different intuition here; I would much prefer the alignment team at e.g. DeepMind to be working at DeepMind as opposed to doing their work for some “alignment-only” outfit. My guess is that there is a non-negligible influence that an alignment team can have on a capabilities org in the form of:
The alignment team interacting with other staff either casually in the office or by e.g. running internal workshops open to all staff (like DeepMind apparently do)
The org consulting with the alignment team (e.g. before releasing models or starting dangerous projects)
Staff working on raw capabilities having somewhere easy to go if they want to shift to alignment work
I think the above benefits likely outweigh the impact of the influence in the other direction (such as the value drift from having economic or social incentives linked to capabilities work)
My sense is that this “they’ll encourage higher ups to think what they’re doing is safe” thing is a meme. Misaligned AI, for people like Yann Lecunn, is not even a consideration; they think it’s this stupid uninformed fearmongering. We’re not even near the point that Phillip Morris is, where tobacco execs have to plaster their webpage with “beyond tobacco” slogans to feel good about themselves—Demis Hassabis literally does not care, even a little bit, and adding alignment staff will not affect his decision making whatsoever.
Why do you think the alignment team at OpenAI is contributing on net to AI danger?
Maybe I don’t know enough about OpenAI’s alignment team to criticize it in public? I wanted to name one alignment outfit because I like to be as specific as possible in my writing. OpenAI popped into my head because of the reasons I describe below. I would be interested in your opinion. Maybe you’ll change my mind.
I had severe doubts about the alignment project (the plan of creating an aligned superintelligence before any group manages an unaligned one) even before Eliezer went public with his grave doubts in the fall of last year. It’s not that I consider the project impossible in principle, just that it is of such difficulty that it seems unlikely that we will accomplish it before the appearance of an unaligned intelligence that kills us all. In other words, I see it as humanly possible, but probably not humanly possible quickly enough. Anna Salamon was saying in 2010 or so that alignment research (called Friendliness research or Friendly-AI research in those days IIRC) was like trying to invent differential equations before the rest of the world invents elementary algebra, which is basically the same take unless I misinterpreted her. Since then of course there has been an alarming amount of progress towards inventing the elementary algebra of her analogy.
I have no objection to people’s continuing to work on alignment, and I’d offer Scott Garrabrant as an example of someone doing good work on it, but it seems unlikely that anyone employed by an organization whose main plan for getting money is to sell AI capabilities would be able to sustainably do good work on it: humans are too easily influenced by their workplaces and by the source of their personal economic security. And OpenAI’s main plan for getting money is to sell AI capabilities. (They were a non-profit at their founding, but switched to being a for-profit in 2019.)
Also, at OpenAI’s founding, the main plan, the main strategy proposed for ensuring that AI will turn out well for humanity, was for OpenAI to publish all its research! Sam Altman walked back on that plan a little, but he didn’t change the name of the organization, a name that is a very concise description of the original plan, which is a sign that he doesn’t really get how misguided and (unintentionally) destructive the original plan was. It is a sign because it is not uncommon for an organization to change its name when it undergoes a true change in strategy or approach.
I used to continue to see doctors who would offer what I knew was bad advice. This was my policy for about 28 years. (And during that time I saw many doctors because I have chronic health conditions. I was already an adult at the start of the 28-year interval.) As long as the doctor did not cost me much and was generally willing to order a significant fraction of the tests and prescribe a significant fraction of the drugs I asked him to order and prescribe, I tended to continue to see that doctor. I stopped doing that because I had accumulated a lot of evidence that their bad advice tended to affect my behavior (to change it to conform to the advice) even though I recognized the advice as bad as soon as it was conveyed to me.
You (the reader, not just the person I am replying to, namely, Nisan) might not have my problem remaining uninfluenced by bad advice from authority figures. Maybe I’m more suggestible than you are. Do you know that for sure? If not, why not err on the side of caution? There are many employers in this world! Why not avoid working for any outfit in which a large fraction of the employees are embarked on a project that will eventually probably kill us all and who have significant career capital invested in that project?
Hmm. I know you (Nisan) work or used to work for Google. I notice that I don’t object to that. I notice that I don’t seem to object much to anyone’s working for an outfit that does a lot of capability research if that is the most efficient way for them to provide for themselves or their family. I just don’t like it as a plan for improving the world. If the best plan a person can come up with for improving the world involves working for an outfit that does a lot of capability research, well, I tend to think that that someone should postpone their ambitions to improve the world and focus on becoming stronger (more rational) and making money to provide for themselves and their family until such time as they can think up a better plan!
Also, my non-objection to people’s continuing to work for AI-capability outfits for personal economic reasons applies only to people who have already invested a lot of time and energy in learning to do that kind of work (through learning on the job or learning on one’s own dime): it is a bad idea IMO for anyone not already on the capabilities-research career path to get on it. I know that many here would disagree, but IMO getting good at AI capabilities work, very probably doesn’t help much for AI alignment work. Look for example at the work of Scott Garrabrant (Cartesian frames, finite factored sets). Very rarely if at all does it rely on the capabilities literature.
Thanks for sharing your reasoning. For what it’s worth, I worked on OpenAI’s alignment team for two years and think they do good work :) I can’t speak objectively, but I’d be happy to see talented people continue to join their team.
I think they’re reducing AI x-risk in expectation because of the alignment research they publish (1 2 3 4). If anyone thinks that research or that kind of research is bad for the world, I’m happy to discuss.
Thanks for your constructive attitude to my words.
I have a different intuition here; I would much prefer the alignment team at e.g. DeepMind to be working at DeepMind as opposed to doing their work for some “alignment-only” outfit. My guess is that there is a non-negligible influence that an alignment team can have on a capabilities org in the form of:
The alignment team interacting with other staff either casually in the office or by e.g. running internal workshops open to all staff (like DeepMind apparently do)
The org consulting with the alignment team (e.g. before releasing models or starting dangerous projects)
Staff working on raw capabilities having somewhere easy to go if they want to shift to alignment work
I think the above benefits likely outweigh the impact of the influence in the other direction (such as the value drift from having economic or social incentives linked to capabilities work)
My sense is that this “they’ll encourage higher ups to think what they’re doing is safe” thing is a meme. Misaligned AI, for people like Yann Lecunn, is not even a consideration; they think it’s this stupid uninformed fearmongering. We’re not even near the point that Phillip Morris is, where tobacco execs have to plaster their webpage with “beyond tobacco” slogans to feel good about themselves—Demis Hassabis literally does not care, even a little bit, and adding alignment staff will not affect his decision making whatsoever.
But shouldn’t we just ask Rohin Shah?
Even a little bit? Are you sure? https://www.lesswrong.com/posts/ido3qfidfDJbigTEQ/have-you-tried-hiring-people?commentId=wpcLnotG4cG9uynjC