Safety-wise, they claim to have run it through their Preparedness framework and the red-team of external experts, but have published no reports on this. “For now”, audio output is limited to a selection of preset voices (addressing audio impersonations).
“Safety”-wise, they obviously haven’t considered the implications of (a) trying to make it sound human and (b) having it try to get the user to like it.
It’s extremely sycophantic, and the voice intensifies the effect. They even had their demonstrator show it a sign saying “I ❤️ ChatGPT”, and instead of flatly saying “I am a machine. Get counseling.”, it acted flattered.
At the moment, it’s really creepy, and most people seem to dislike it pretty intensely. But I’m sure they’ll tune that out if they can.
There’s a massive backlash against social media selecting for engagement. There’s a lot of worry about AI manipulation. There’s a lot of talk from many places about how “we should have seen the bad impacts of this or that, and we’ll do better in the future”. There’s a lot of high-sounding public interest blather all around. But apparently none of that actually translates into OpenAI, you know, not intentionally training a model to emotionally manipulate humans for commercial purposes.
Still not an X-risk, but definitely on track to build up all the right habits for ignoring one when it pops up...
I’m guessing that measuring performance on those demographic categories will tend to underestimate the models’ potential effectiveness, because they’ve been intentionally tuned to “debias” them on those categories or on things closely related to them.