I think a lot of alignment folk have made positive updates in response to the societal response to AI xrisk.
This is probably different than what you’re pointing at (like maybe your claim is more like “Lots of alignment folks only make negative updates when responding to technical AI developments” or something like that).
That said, I don’t think the examples you give are especially compelling. I think the following position is quite reasonable (and I think fairly common):
Bing Chat provides evidence that some frontier AI companies will fail at alignment even on relatively “easy” problems that we know how to solve with existing techniques. Also, as Habryka mentioned, it’s evidence that the underlying competitive pressures will make some companies “YOLO” and take excessive risk. This doesn’t affect the absolute difficultly of alignment but it affects the probability that Earth will actually align AGI.
ChatGPT provides evidence that we can steer the behavior of current large language models. People who predicted that it would be hard to align large language models should update. IMO, many people seem to have made mild updates here, but not strong ones, because they (IMO correctly) claim that their threat models never had strong predictions about the kinds of systems we’re currently seeing and instead predicted that we wouldn’t see major alignment problems until we get smarter systems (e.g., systems with situational awareness and more coherent goals).
(My “Alex sim”– which is not particularly strong– says that maybe these people are just post-hoc rationalizing– like if you had asked them in 2015 how likely we would be to be able to control modern LLMs, they would’ve been (a) wrong and (b) wrong in an important way– like, their model of how hard it would be to control modern LLMs is very interconnected with their model of why it would be hard to control AGI/superintelligence. Personally, I’m pretty sympathetic to the point that many models of why alignment of AGI/superintelligence would be hard seem relatively disconnected to any predictions about modern LLMs, such that only “small/mild” updates seem appropriate for people who hold those models.)
I think a lot of alignment folk have made positive updates in response to the societal response to AI xrisk.
This is probably different than what you’re pointing at (like maybe your claim is more like “Lots of alignment folks only make negative updates when responding to technical AI developments” or something like that).
That said, I don’t think the examples you give are especially compelling. I think the following position is quite reasonable (and I think fairly common):
Bing Chat provides evidence that some frontier AI companies will fail at alignment even on relatively “easy” problems that we know how to solve with existing techniques. Also, as Habryka mentioned, it’s evidence that the underlying competitive pressures will make some companies “YOLO” and take excessive risk. This doesn’t affect the absolute difficultly of alignment but it affects the probability that Earth will actually align AGI.
ChatGPT provides evidence that we can steer the behavior of current large language models. People who predicted that it would be hard to align large language models should update. IMO, many people seem to have made mild updates here, but not strong ones, because they (IMO correctly) claim that their threat models never had strong predictions about the kinds of systems we’re currently seeing and instead predicted that we wouldn’t see major alignment problems until we get smarter systems (e.g., systems with situational awareness and more coherent goals).
(My “Alex sim”– which is not particularly strong– says that maybe these people are just post-hoc rationalizing– like if you had asked them in 2015 how likely we would be to be able to control modern LLMs, they would’ve been (a) wrong and (b) wrong in an important way– like, their model of how hard it would be to control modern LLMs is very interconnected with their model of why it would be hard to control AGI/superintelligence. Personally, I’m pretty sympathetic to the point that many models of why alignment of AGI/superintelligence would be hard seem relatively disconnected to any predictions about modern LLMs, such that only “small/mild” updates seem appropriate for people who hold those models.)