I specialize in regulatory affairs for AI-enabled Software as a Medical Device and hope to work in AI risk-mitigation.
Jemal Young
Wow, thanks for posting this dialog. The pushback from the human (you?) is commendably unrelenting, like a bulldog with a good grip on ChatGPT’s leg.
ChatGPT seems harder to jailbreak now than it was upon first release. For example, I can’t reproduce the above jailbreaks with prompts copied verbatim, and my own jailbreaks from a few days ago aren’t working.
Has anyone else noticed this? If yes, does that indicate OpenAI has been making tweaks?
Not many more fundamental innovations needed for AGI.
Can you say more about this? Does the DeepMind AGI safety team have ideas about what’s blocking AGI that could be addressed by not many more fundamental innovations?
Why is counterfactual reasoning a matter of concern for AI alignment?
I mean extracting insights from capabilities research that currently exists, not changing the direction of new research. For example, specification gaming is on everyone’s radar because it was observed in capabilities research (the authors of the linked post compiled this list of specification-gaming examples, some of which are from the 1980s). I wonder how much more opportunity there might be to piggyback on existing capabilities research for alignment purposes, and maybe to systemize that going forward.
[Question] How might we make better use of AI capabilities research for alignment purposes?
What are the best reasons to think there’s a human-accessible pathway to safe AGI?
Based on the five Maybes you suggested might happen, it sounds like you’re saying some AI doomers are overconfident because there are a million things that could go potentially right. But there doesn’t seem to be a good reason to expect any of those maybes to be likelihoods, and they seem more speculative (e.g. “consciousness comes online”) than the reasons well-informed AI doomers think there’s a good chance of doom this century.
PS I also have no qualifications on this.