This level of arrogant, dangerous incompetence from a multi-trillion dollar tech company is disheartening, but if your theory is correct (and seems increasingly plausible), then I guess the good news is that Sydney is not evidence for failure of OpenAI style RLHF with scale.
Unaligned AGI doesn’t take over the world by killing us—it takes over the world by seducing us.
No, but the hacks of ChatGPT already provided a demonstration of problems with RLHF. I’m worried we’re in a situation analogous to ‘Smashing The Stack For Fun And Profit’ being published 27 years ago (reinventing vulnsknown since MULTICS in the 1960s) and all the C/C++ programmers in denial are going ‘bro I can patch that example, it’s no big deal, it’s just a loophole, we don’t need to change everything, you just gotta get good at memory management, bro, this isn’t hard to fix bro use a sanitizer and turn on -Wall, we don’t need to stop using C-like languages, u gotta believe me we can’t afford a 20% slowdown and it definitely won’t take us 3 decades and still be finding remote zero-days and new gadgets no way man you’re just making that up stop doom-mongering and FUDing bro (i’m too old to learn a new language)’.
This level of arrogant, dangerous incompetence from a multi-trillion dollar tech company is disheartening, but if your theory is correct (and seems increasingly plausible), then I guess the good news is that Sydney is not evidence for failure of OpenAI style RLHF with scale.
Unaligned AGI doesn’t take over the world by killing us—it takes over the world by seducing us.
No, but the hacks of ChatGPT already provided a demonstration of problems with RLHF. I’m worried we’re in a situation analogous to ‘Smashing The Stack For Fun And Profit’ being published 27 years ago (reinventing vulns known since MULTICS in the 1960s) and all the C/C++ programmers in denial are going ‘bro I can patch that example, it’s no big deal, it’s just a loophole, we don’t need to change everything, you just gotta get good at memory management, bro, this isn’t hard to fix bro use a sanitizer and turn on
-Wall
, we don’t need to stop using C-like languages, u gotta believe me we can’t afford a 20% slowdown and it definitely won’t take us 3 decades and still be finding remote zero-days and new gadgets no way man you’re just making that up stop doom-mongering and FUDing bro (i’m too old to learn a new language)’.very very funny example to use with Jake, a veteran c++ wizard
“Unaligned AGI doesn’t take over the world by killing us—it takes over the world by seducing us.”
Por que no los dos?