My goal is to do work that counterfactually reduces AI risk from loss-of-control scenarios. My perspective is shaped by my experience as the founder of a VC-backed AI startup, which gave me a firsthand understanding of the urgent need for safety.
I have a B.S. in Artificial Intelligence from Carnegie Mellon and am currently a CBAI Fellow at MIT/Harvard. My primary project is ForecastLabs, where I’m building predictive maps of the AI landscape to improve strategic foresight.
I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html—inspired by Daniel Kokotajlo.
(xkcd meme)
Thanks for flagging, Misha, this is a good point
This was the full system prompt with bold my analagous part:
It is probably the case that it will end turn more often if #3 is more often, but that might defeat part of the purpose of this evaluation, that it should follow safety directives even in ambiguous scenarios.