(iii) because if this was true, then we could presumably just solve alignment without the help of AI assistants.
Either I misunderstand this or it seems incorrect.
It could be the case that the current state of the world doesn’t put us on track to solve Alignment in time, but using AI assistants to increase the rate of Alignment : Capabilities work by some amount is sufficient.
The use of AI assistants for alignment : capabilities doesn’t have to track with the current rate of Alignment : Capabilities work. For instance, if the AI labs with the biggest lead are safety conscious, I expect the ratio of alignment : capabilities research they produce to be much higher (compared to now) right before AGI. See here.
> (iii) because if this was true, then we could presumably just solve alignment without the help of AI assistants.
Either I misunderstand this or it seems incorrect.
Hm, I think you are right—as written, the claim is false. I think some version of (X) --- the assumption around your ability to differentially use AI assistants for alignment—will still be relevant; it will just need a bit more careful phrasing. Let me know if this makes sense:
To get a more realistic assumption, perhaps we could want to talk about (speedup) “how much are AI assistants able to speed up alignment vs capability” and (proliferation prevention) “how much can OpenAI prevent them from proliferating to capabilities research”.[1] And then the corresponding more realistic version of the claims would be that:
either (i’) AI assistants will fundamentally be able to speed up alignment much more than capabilities
or (ii’) the potential speedup ratios will be comparable, but OpenAI will be able to significantly restrict the proliferation of AI assistants for capabilities research
or (iii’) both the potential speedup ratios and adoption rates of AI assistants will be comparable for capabilities research will be, but somehow we will have enough time to solve alignment anyway.
Comments:
Regarding (iii’): It seems that in the worlds where (iii’) holds, you could just as well solve alignment without developing AI assistants.
Regarding (i’): Personally I don’t buy this assumption. But you could argue for it on the grounds that perhaps alignment is just impossible to solve for unassisted humans. (Otherwise arguing for (i’) seems rather hard to me.)
Regarding (ii’): As before, this seems implausible based on the track record :-).
This implicitly assumes that if OpenAI develops the AI assistants technology and restrict proliferation, you will get similar adoption in capabilities vs alignment. This seems realistic.
Makes sense. FWIW, based on Jan’s comments I think the main/only thing the OpenAI alignment team is aiming for here is i, differentially speeding up alignment research. It doesn’t seem like Jan believes in this plan; personally I don’t believe in this plan.
4. We want to focus on aspects of research work that are differentially helpful to alignment. However, most of our day-to-day work looks like pretty normal ML work, so it might be that we’ll see limited alignment research acceleration before ML research automation happens.
I don’t know how to link to the specific comment, but here somewhere. Also:
We can focus on tasks differentially useful to alignment research
Your pessimism about iii still seems a bit off to me. I agree that if you were coordinating well between all the actors than yeah you could just hold off on AI assistants. But the actual decision the OpenAI alignment team is facing could be more like “use LLMs to help with alignment research or get left behind when ML research gets automated”. If facing such choices I might produce a plan like theirs, but notably I would be much more pessimistic about it. When the universe limits you to one option, you shouldn’t expect it to be particularly good. The option “everybody agrees to not build AI assistants and we can do alignment research first” is maybe not on the table, or at least it probably doesn’t feel like it is to the alignment team at OpenAI.
Oh, I think I agree—if the choice is to use AI assistants or not, then use them. If they need adapting to be useful for alignment, then do adapt them.
But suppose they only work kind-of-poorly—and using them for alignment requires making progress on them (which will also be useful for capabilities), and you will not be able to keep those results internal. And that you can either do this work or do literally nothing. (Which is unrealistic.) Then I would say doing literally nothing is better. (Though it certainly feels bad, and probably costs you your job. So I guess some third option would be preferable.)
Either I misunderstand this or it seems incorrect.
It could be the case that the current state of the world doesn’t put us on track to solve Alignment in time, but using AI assistants to increase the rate of Alignment : Capabilities work by some amount is sufficient.
The use of AI assistants for alignment : capabilities doesn’t have to track with the current rate of Alignment : Capabilities work. For instance, if the AI labs with the biggest lead are safety conscious, I expect the ratio of alignment : capabilities research they produce to be much higher (compared to now) right before AGI. See here.
Hm, I think you are right—as written, the claim is false. I think some version of (X) --- the assumption around your ability to differentially use AI assistants for alignment—will still be relevant; it will just need a bit more careful phrasing. Let me know if this makes sense:
To get a more realistic assumption, perhaps we could want to talk about (speedup) “how much are AI assistants able to speed up alignment vs capability” and (proliferation prevention) “how much can OpenAI prevent them from proliferating to capabilities research”.[1] And then the corresponding more realistic version of the claims would be that:
either (i’) AI assistants will fundamentally be able to speed up alignment much more than capabilities
or (ii’) the potential speedup ratios will be comparable, but OpenAI will be able to significantly restrict the proliferation of AI assistants for capabilities research
or (iii’) both the potential speedup ratios and adoption rates of AI assistants will be comparable for capabilities research will be, but somehow we will have enough time to solve alignment anyway.
Comments:
Regarding (iii’): It seems that in the worlds where (iii’) holds, you could just as well solve alignment without developing AI assistants.
Regarding (i’): Personally I don’t buy this assumption. But you could argue for it on the grounds that perhaps alignment is just impossible to solve for unassisted humans. (Otherwise arguing for (i’) seems rather hard to me.)
Regarding (ii’): As before, this seems implausible based on the track record :-).
This implicitly assumes that if OpenAI develops the AI assistants technology and restrict proliferation, you will get similar adoption in capabilities vs alignment. This seems realistic.
Makes sense. FWIW, based on Jan’s comments I think the main/only thing the OpenAI alignment team is aiming for here is i, differentially speeding up alignment research. It doesn’t seem like Jan believes in this plan; personally I don’t believe in this plan.
I don’t know how to link to the specific comment, but here somewhere. Also:
Your pessimism about iii still seems a bit off to me. I agree that if you were coordinating well between all the actors than yeah you could just hold off on AI assistants. But the actual decision the OpenAI alignment team is facing could be more like “use LLMs to help with alignment research or get left behind when ML research gets automated”. If facing such choices I might produce a plan like theirs, but notably I would be much more pessimistic about it. When the universe limits you to one option, you shouldn’t expect it to be particularly good. The option “everybody agrees to not build AI assistants and we can do alignment research first” is maybe not on the table, or at least it probably doesn’t feel like it is to the alignment team at OpenAI.
Oh, I think I agree—if the choice is to use AI assistants or not, then use them. If they need adapting to be useful for alignment, then do adapt them.
But suppose they only work kind-of-poorly—and using them for alignment requires making progress on them (which will also be useful for capabilities), and you will not be able to keep those results internal. And that you can either do this work or do literally nothing. (Which is unrealistic.) Then I would say doing literally nothing is better. (Though it certainly feels bad, and probably costs you your job. So I guess some third option would be preferable.)