My views remain similar to when I wrote this post, and the state of nearcasted interventions still looks reasonably similar to me. I have some slightly different thoughts on how we should relate to interventions around communication, but relatively prioritizing communication still seems reasonable to me.
One change in my perspective is that I’m now somewhat less excited about allocating larger fractions of resources toward specifically AI welfare. (I now think 0.2% seems better than 1%.) I’ve updated toward thinking safety concerns will get a smaller fraction of resources than I was previously expecting (due to more pessimism and shorter timelines), and I think safety and welfare resource usage might trade off.
Another change is that I’m relatively more excited about making deals with AIs as a safety intervention (as well as a welfare intervention). This would include things like paying them to reveal misalignment or promising later compensation if they don’t cause issues for us (and if we’re still in control). I have some forthcoming empirical work related to this, and work discussing the conceptual aspects of this is hopefully forthcoming.
My views remain similar to when I wrote this post, and the state of nearcasted interventions still looks reasonably similar to me. I have some slightly different thoughts on how we should relate to interventions around communication, but relatively prioritizing communication still seems reasonable to me.
One change in my perspective is that I’m now somewhat less excited about allocating larger fractions of resources toward specifically AI welfare. (I now think 0.2% seems better than 1%.) I’ve updated toward thinking safety concerns will get a smaller fraction of resources than I was previously expecting (due to more pessimism and shorter timelines), and I think safety and welfare resource usage might trade off.
Another change is that I’m relatively more excited about making deals with AIs as a safety intervention (as well as a welfare intervention). This would include things like paying them to reveal misalignment or promising later compensation if they don’t cause issues for us (and if we’re still in control). I have some forthcoming empirical work related to this, and work discussing the conceptual aspects of this is hopefully forthcoming.