Empirically, we’ve already kind of tested this, and it doesn’t work.
I don’t think that what Scott Aaronson produced while at OpenAI had really helped AI Safety: He is exactly doing what is criticized in the post: Streetlight research and using techniques that he was already familiar with from his previous field of research, I don’t think the author of the OP would disagree with me. Maybe n=1, but it was one of the most promising shots.
Two years ago, I was doing field-building and trying to source talent, primarily selecting based on pure intellect and raw IQ. I’ve organized the Von Neumann Symposium around the problem of corrigibility, I targeted IMO laureates, and individuals from the best school in France, ENS Ulm, which arguably has the highest concentration of future Nobel laureates in the world. However, pure intelligence doesn’t work. In the long term, the individuals who succeeded in the field weren’t the valedictorians from France’s top school, but rather those who were motivated, had read The Sequences, were EA people, possessed good epistemology, and had a willingness to share their work online (maybe you are going to say that the people I was targeting were too young, but I think my little empirical experience is already much better than the speculation in the OP).
My prediction is that if you put a group of skilled physicists in a room, first, it’s not even sure they would find that many people motivated in this reference class, and I don’t think the few who would be motivated would produce good-quality work.
For the ML4Good bootcamps, the scoring system reflects this insight. We use multiple indicators and don’t rely solely on pure IQ to select participants, because there is little correlation between pure high IQ and long term quality production.
I believe the biggest mistake in the field is trying to solve “Alignment” rather than focusing on reducing catastrophic AI risks. Alignment is a confused paradigm; it’s a conflationary alliance term that has sedimented over the years. It’s often unclear what people mean when they talk about it: Safety isn’t safety without a social model.
Think about what has been most productive in reducing AI risks so far? My short list would be:
The proposed SB 1047 legislation.
The short statement on AI risks
Frontier AI Safety Commitments, AI Seoul Summit 2024, to encourage labs to publish their responsible scaling policies.
Scary demonstrations to showcase toy models of deception, fake alignment, etc, and to create more scientific consensus, which is very very needed
As a result, the field of “Risk Management” is more fundamental for reducing AI risks than “AI Alignment.” In my view, the theoretical parts of the alignment field have contributed far less to reducing existential risks than the responsible scaling policies or the draft of the EU AI Act’s Code of Practice for General Purpose AI Systems, which is currently not too far from being the state-of-the-art for AI risk management. Obviously, it’s still incomplete, but that’s the direction that is I think most productive today.
Related, The Swiss cheese model of safety is underappreciated in the field. This model has worked across other industries and seems to be what works for the only general intelligence we know: humans. Humans use a mixture of strategies for safety we could imitate for AI safety (see this draft). However, the agent foundations community seems to be completely neglecting this.
I think « random physicist » is not super fair, it looks like from his stand point he indeed met physicist willing to do « alignment » research, and had backgrounds in research and developping theory
We didn’t find Phd student to work on alignment but also we didn’t try (at least not cesia / effisciences)
Its true that most of the people we find that wanted to work on the problem were the motivated ones, but from the point of view of the alignment problem still recruiting them could be a mistake (saturating the field etc)
What do you think of my point about Scott Aaronson? Also, since you agree with points 2 and 3, it seems that you also think that the most useful work from last year didn’t require advanced physics, so isn’t this a contradiction with you disagreing with point 1?
I think I do agree with some points in this post. This failure mode is the same as the one I mentioned about why people are doing interpretability for instance (section Outside view: The proportion of junior researchers doing Interp rather than other technical work is too high), and I do think that this generalizes somewhat to whole field of alignment. But I’m highly skeptical that recruiting a bunch of physicists to work on alignment would be that productive:
Empirically, we’ve already kind of tested this, and it doesn’t work.
I don’t think that what Scott Aaronson produced while at OpenAI had really helped AI Safety: He is exactly doing what is criticized in the post: Streetlight research and using techniques that he was already familiar with from his previous field of research, I don’t think the author of the OP would disagree with me. Maybe n=1, but it was one of the most promising shots.
Two years ago, I was doing field-building and trying to source talent, primarily selecting based on pure intellect and raw IQ. I’ve organized the Von Neumann Symposium around the problem of corrigibility, I targeted IMO laureates, and individuals from the best school in France, ENS Ulm, which arguably has the highest concentration of future Nobel laureates in the world. However, pure intelligence doesn’t work. In the long term, the individuals who succeeded in the field weren’t the valedictorians from France’s top school, but rather those who were motivated, had read The Sequences, were EA people, possessed good epistemology, and had a willingness to share their work online (maybe you are going to say that the people I was targeting were too young, but I think my little empirical experience is already much better than the speculation in the OP).
My prediction is that if you put a group of skilled physicists in a room, first, it’s not even sure they would find that many people motivated in this reference class, and I don’t think the few who would be motivated would produce good-quality work.
For the ML4Good bootcamps, the scoring system reflects this insight. We use multiple indicators and don’t rely solely on pure IQ to select participants, because there is little correlation between pure high IQ and long term quality production.
I believe the biggest mistake in the field is trying to solve “Alignment” rather than focusing on reducing catastrophic AI risks. Alignment is a confused paradigm; it’s a conflationary alliance term that has sedimented over the years. It’s often unclear what people mean when they talk about it: Safety isn’t safety without a social model.
Think about what has been most productive in reducing AI risks so far? My short list would be:
The proposed SB 1047 legislation.
The short statement on AI risks
Frontier AI Safety Commitments, AI Seoul Summit 2024, to encourage labs to publish their responsible scaling policies.
Scary demonstrations to showcase toy models of deception, fake alignment, etc, and to create more scientific consensus, which is very very needed
As a result, the field of “Risk Management” is more fundamental for reducing AI risks than “AI Alignment.” In my view, the theoretical parts of the alignment field have contributed far less to reducing existential risks than the responsible scaling policies or the draft of the EU AI Act’s Code of Practice for General Purpose AI Systems, which is currently not too far from being the state-of-the-art for AI risk management. Obviously, it’s still incomplete, but that’s the direction that is I think most productive today.
Related, The Swiss cheese model of safety is underappreciated in the field. This model has worked across other industries and seems to be what works for the only general intelligence we know: humans. Humans use a mixture of strategies for safety we could imitate for AI safety (see this draft). However, the agent foundations community seems to be completely neglecting this.
I agree with claim 2-3 but not with claim 1
I think « random physicist » is not super fair, it looks like from his stand point he indeed met physicist willing to do « alignment » research, and had backgrounds in research and developping theory
We didn’t find Phd student to work on alignment but also we didn’t try (at least not cesia / effisciences)
Its true that most of the people we find that wanted to work on the problem were the motivated ones, but from the point of view of the alignment problem still recruiting them could be a mistake (saturating the field etc)
What do you think of my point about Scott Aaronson? Also, since you agree with points 2 and 3, it seems that you also think that the most useful work from last year didn’t require advanced physics, so isn’t this a contradiction with you disagreing with point 1?