while encouraging for dialog, I don’t think he’s got a very good model of what implications his attempts to create scientist ai have on the ways an executive ai could attack us. even without mesaoptimization (something his approaches are absolutely not safer from at all), executive ai actually doesn’t need to be very strong at first escape to kill us all. it need only know how to use a scientist ai to finish the job. and any science results we get from human-steered scientist ai will make it that much easier for executive ai by not even requiring it to run the experiments. banning executive ai will be incredibly hard and likely permanently curtail human freedom to compute, which is likely to result in such bans being highly ideologically laden one way or another unless normally-conflicting non-centrist ideologies[1] can maintain classical liberal “value compromise” centrism for long enough to find a way to guarantee the constraints that prevent ai executives still allow everyday people to keep their general life freedom of non-ai computation, ideally even including doing research using ai scientists. Auth versions of ideologies are gonna want to control this outcome, and it’s worrying, as both progressive and regressive ideologies in the US have been going auth.
on the other hand, I do think that successfully banning dangerous ai in a way compatible with all ideologies is the most promising practical approach anyone has proposed besides formal goal alignment and low-impact-bounded informal goal alignment, both of which have major research problems still open.
the issue I still see is—how do you recognize an ai executive that is trying to disguise itself?
a sketch of non-centrist ideologies might be, authoritarian progressive/center/regressive, libertarian progressive/center/regressive, as well as another dimension of community-first/multiscale/individual-first that can apply to the others; if you want me to hunt down citations for why I think this is a good map of ideologies I can do so, please comment. I consider myself libertarian progressive multiscale
the issue I still see is—how do you recognize an ai executive that is trying to disguise itself?
It can’t disguise itself without researching disguising methods first. The question is will interpretability tools be up to the task of catching it.
It will not work for catching AI executive originating outside of controlled environment (unless it queries AI scientist). But given that such attempts will originate from uncoordinated relatively computationally underpowered sources, it may be possible to preemptively enumerate disguising techniques that such AI executive could come up with. If there are undetectable varieties..., well, it’s mostly game over.
while encouraging for dialog, I don’t think he’s got a very good model of what implications his attempts to create scientist ai have on the ways an executive ai could attack us. even without mesaoptimization (something his approaches are absolutely not safer from at all), executive ai actually doesn’t need to be very strong at first escape to kill us all. it need only know how to use a scientist ai to finish the job. and any science results we get from human-steered scientist ai will make it that much easier for executive ai by not even requiring it to run the experiments. banning executive ai will be incredibly hard and likely permanently curtail human freedom to compute, which is likely to result in such bans being highly ideologically laden one way or another unless normally-conflicting non-centrist ideologies[1] can maintain classical liberal “value compromise” centrism for long enough to find a way to guarantee the constraints that prevent ai executives still allow everyday people to keep their general life freedom of non-ai computation, ideally even including doing research using ai scientists. Auth versions of ideologies are gonna want to control this outcome, and it’s worrying, as both progressive and regressive ideologies in the US have been going auth.
on the other hand, I do think that successfully banning dangerous ai in a way compatible with all ideologies is the most promising practical approach anyone has proposed besides formal goal alignment and low-impact-bounded informal goal alignment, both of which have major research problems still open.
the issue I still see is—how do you recognize an ai executive that is trying to disguise itself?
a sketch of non-centrist ideologies might be, authoritarian progressive/center/regressive, libertarian progressive/center/regressive, as well as another dimension of community-first/multiscale/individual-first that can apply to the others; if you want me to hunt down citations for why I think this is a good map of ideologies I can do so, please comment. I consider myself libertarian progressive multiscale
It can’t disguise itself without researching disguising methods first. The question is will interpretability tools be up to the task of catching it.
It will not work for catching AI executive originating outside of controlled environment (unless it queries AI scientist). But given that such attempts will originate from uncoordinated relatively computationally underpowered sources, it may be possible to preemptively enumerate disguising techniques that such AI executive could come up with. If there are undetectable varieties..., well, it’s mostly game over.