I have sometimes seen people/contests focused on writing up specific scenarios for how AI can go wrong starting with our current situation and fictionally projecting into the future. I think the idea is that this can act as an intuition pump and potentially a way to convince people.
I think that is likely net negative given the fact that state of the art AIs are being trained on internet text and stories where a good agent starts behaving badly are a key component motivating the Waluigi effect.
These sort of stories still seem worth thinking about, but perhaps greater care should be taken not to inject GPT-5′s training data with examples of chatbots that go murderous. Maybe only post it as a zip file or use a simple cipher.
I have sometimes seen people/contests focused on writing up specific scenarios for how AI can go wrong starting with our current situation and fictionally projecting into the future. I think the idea is that this can act as an intuition pump and potentially a way to convince people.
I think that is likely net negative given the fact that state of the art AIs are being trained on internet text and stories where a good agent starts behaving badly are a key component motivating the Waluigi effect.
These sort of stories still seem worth thinking about, but perhaps greater care should be taken not to inject GPT-5′s training data with examples of chatbots that go murderous. Maybe only post it as a zip file or use a simple cipher.