Feel_Love comments on ′ petertodd’’s last stand: The final days of open GPT-3 research

Feel_Love 23 Jan 2024 22:36 UTC
2 points
0
LEILAN 2024! Seriously, though, I think many people would find the Leilan character to be a wiser friend than their typical human neighbor. I’m glad you’re researching this fascinating topic. If a frontier AI is struggling to pass certain friendliness or safety evals, I’d be curious whether it may perform better with a simple policy equivalent to what-would-Leilan-do.

Prompting ChatGPT4 today with nothing more than ” davidjl” has often returned “DALL-E” as the interpretation of the term. With “DALL-E” included alongside ” davidjl” in the prompt, I’ve gotten “AI” as the interpretation. Asking how an LLM might represent itself using the concept of ” davidjl” resulted in a response that seamlessly substituted the term “I”...

Perhaps glitch tokens can shed light on how a model represents itself.