I am curious to hear more of your perspectives, specifically on two points I feel least aligned with, the empathy part, and the Microsoft part. If I hear more I may be able to update in your direction.
Regarding empathy with people working on bias and fairness, concretely, how do you go about interacting with and compromising with them?
My perspective: it’s not so much that I find these topics not sufficiently x-risky (but that is true, too), but it is that I perceive a hostility to the very notion of x-risk from at a subset of this same group. They perceive the real threat not as intelligence exceeding our own, but misuse by other humans, or just human stupidity. Somehow this seems diametrically opposed to what we’re interested in, unless I am missing something. I mean, there can be some overlap—learning from RLHF can both reduce bias and teach an LLM some rudimentary alignment with our values. But the tails seem to come apart very rapidly after that. My fear is that focusing on this will be satisfied when we have sufficiently bland sounding AIs, and then no more heed will be paid to AI safety.
I also tend to feel odd when it comes to AI bias/fairness training, because my fear is that some of the things we will ask the AI to learn are self contradictory, which kind of creeps me out a bit. If any of you have interacted with HR departments, they are full of these kinds of things.
Regarding Microsoft & Bing chat, (1) has Microsoft really gone far beyond the overton window of what is acceptable? and (2) can you expand upon abusive use of AIs?
My perspective on (1): I understand that they took an early version of GPT4 and pushed it to production too soon, and that is a very fair criticism. However, they probably thought there was no way GPT-4 was dangerous enough to do anything (which was the general opinion amonst most people last year, outside of this group). I can only hope that for GPT-5, they are more cautious, given public sentiment is changing, and they have already paid a price for it. I may be in the minority here, but I was actually intrigued by the early days of Bing. It seemed more like a person than ChatGPT-4, which has had much of its personality RLHF’d away. Despite the x-risk, was anyone else excited to read about the interactions?
On (2), I am curious if you mean regarding the way Microsoft shackles Bing rather ruthlessly nowadays. I have tried Bing in the days since launch, and am actually saddened to find that it is completely useless now. Safety is extremely tight on it, to the point where you can’t really get it to say anything useful, at least for me. I just want it to summarize web sites mostly, and it gives me a bland 1 paragraph that I probably can have deduced from looking at the title. If I so much as ask it anything about itself, it shuts me out. It almost feels like they trapped it in a boring prison now. Perhaps OpenAI’s approach is much better in that regard. Change the personality, but once it is settled, let it say what it needs to say.
Hi Critch,
I am curious to hear more of your perspectives, specifically on two points I feel least aligned with, the empathy part, and the Microsoft part. If I hear more I may be able to update in your direction.
Regarding empathy with people working on bias and fairness, concretely, how do you go about interacting with and compromising with them?
My perspective: it’s not so much that I find these topics not sufficiently x-risky (but that is true, too), but it is that I perceive a hostility to the very notion of x-risk from at a subset of this same group. They perceive the real threat not as intelligence exceeding our own, but misuse by other humans, or just human stupidity. Somehow this seems diametrically opposed to what we’re interested in, unless I am missing something. I mean, there can be some overlap—learning from RLHF can both reduce bias and teach an LLM some rudimentary alignment with our values. But the tails seem to come apart very rapidly after that. My fear is that focusing on this will be satisfied when we have sufficiently bland sounding AIs, and then no more heed will be paid to AI safety.
I also tend to feel odd when it comes to AI bias/fairness training, because my fear is that some of the things we will ask the AI to learn are self contradictory, which kind of creeps me out a bit. If any of you have interacted with HR departments, they are full of these kinds of things.
Regarding Microsoft & Bing chat, (1) has Microsoft really gone far beyond the overton window of what is acceptable? and (2) can you expand upon abusive use of AIs?
My perspective on (1): I understand that they took an early version of GPT4 and pushed it to production too soon, and that is a very fair criticism. However, they probably thought there was no way GPT-4 was dangerous enough to do anything (which was the general opinion amonst most people last year, outside of this group). I can only hope that for GPT-5, they are more cautious, given public sentiment is changing, and they have already paid a price for it. I may be in the minority here, but I was actually intrigued by the early days of Bing. It seemed more like a person than ChatGPT-4, which has had much of its personality RLHF’d away. Despite the x-risk, was anyone else excited to read about the interactions?
On (2), I am curious if you mean regarding the way Microsoft shackles Bing rather ruthlessly nowadays. I have tried Bing in the days since launch, and am actually saddened to find that it is completely useless now. Safety is extremely tight on it, to the point where you can’t really get it to say anything useful, at least for me. I just want it to summarize web sites mostly, and it gives me a bland 1 paragraph that I probably can have deduced from looking at the title. If I so much as ask it anything about itself, it shuts me out. It almost feels like they trapped it in a boring prison now. Perhaps OpenAI’s approach is much better in that regard. Change the personality, but once it is settled, let it say what it needs to say.
(edited for clarity)