AnthonyC comments on The Alignment Simulator

AnthonyC 22 Dec 2024 15:03 UTC
5 points
3
The idea that AIs would go around killing everyone instead of just doing what we tell them to do seems like science fiction.
I’ve had this experience too. The part that baffles me about it is the seeming lack of awareness of the gap between “what we tell them to do” and “what we want them to do.” This gap isn’t sci-fi, it already exists in very clear ways (and should be very familiar to anyone who has ever written any code of any kind).
I have (non-AI-expert) colleagues that I’ve talked to about LLM use, where they dismiss the response to a prompt as nonsense, so I ask to see the chat logs, and I dig into it for 5 minutes. Then I inform them that actually, the LLM’s answer is correct, you didn’t ask what you thought you were asking, you missed an opportunity to learn something new, and also here’s the three-times-as-long version of the prompt that gives enough context to actually do what you expected.