In case that I get distracted and fail to come back to this, I just want to say that I think this type of project is extremely valuable and should be the main focus of the AI safety/alignment movement (IMHO). From my perspective, most effort seems to be going into writing about arguments for why a future AGI will be a disaster by default, as well as some highly theoretical ideas for preventing this from happening. This type of discourse ignores current ML systems, understandably, because future AGI will be qualitatively different from our current models.
However, the problem with this a priori approach is that it alienates the people working in the industry, who are ultimately the ones we need to be in conversation with if alignment is ever going to become a popular issue. What we really need are experiments like what you’re doing, i.e. actually getting a real AI to do something nasty on camera. This helps us learn to deal with nasty AI, but I think far more importantly it puts the AI safety conversation in the same experimental setting as the rest of the field.
In case that I get distracted and fail to come back to this, I just want to say that I think this type of project is extremely valuable and should be the main focus of the AI safety/alignment movement (IMHO). From my perspective, most effort seems to be going into writing about arguments for why a future AGI will be a disaster by default, as well as some highly theoretical ideas for preventing this from happening. This type of discourse ignores current ML systems, understandably, because future AGI will be qualitatively different from our current models.
However, the problem with this a priori approach is that it alienates the people working in the industry, who are ultimately the ones we need to be in conversation with if alignment is ever going to become a popular issue. What we really need are experiments like what you’re doing, i.e. actually getting a real AI to do something nasty on camera. This helps us learn to deal with nasty AI, but I think far more importantly it puts the AI safety conversation in the same experimental setting as the rest of the field.