Thanks, I find your neocortex-like AGI approach really illuminating.
Random thought:
(I think you also need to somehow set up the system so that “do nothing” is the automatically-acceptable default operation when every possibility is unpalatable.)
I was wondering if this is necessarily the best „everything is unpalatable“ policy. I could imagine that the best fallback option could also be something like „preserve your options while gathering information, strategizing and communicating with relevant other agents“, assuming that this is not unpalatable, too. I guess we may not yet trust the AGI to do this, option preservation might cause much more harm than doing nothing. But I still wonder if there are cases in which every option is unpalatable but doing nothing is clearly worse.
Yeah I was really only thinking about “not yet trust the AGI” as the main concern. Like, I’m somewhat hopeful that we can get the AGI to have a snap negative reaction to the thought of deceiving its operator, but it’s bound to have a lot of other motivations too, and some of those might conflict with that. And it seems like a harder task to make sure that the latter motivations will never ever outbid the former, than to just give every snap negative reaction a veto, or something like that, if that’s possible.
I don’t think “if every option is bad, freeze in place paralyzed forever” is a good strategy for humans :-P and eventuality it would be a bad strategy for AGIs too, as you say.
Thanks, I find your neocortex-like AGI approach really illuminating.
Random thought:
I was wondering if this is necessarily the best „everything is unpalatable“ policy. I could imagine that the best fallback option could also be something like „preserve your options while gathering information, strategizing and communicating with relevant other agents“, assuming that this is not unpalatable, too. I guess we may not yet trust the AGI to do this, option preservation might cause much more harm than doing nothing. But I still wonder if there are cases in which every option is unpalatable but doing nothing is clearly worse.
Yeah I was really only thinking about “not yet trust the AGI” as the main concern. Like, I’m somewhat hopeful that we can get the AGI to have a snap negative reaction to the thought of deceiving its operator, but it’s bound to have a lot of other motivations too, and some of those might conflict with that. And it seems like a harder task to make sure that the latter motivations will never ever outbid the former, than to just give every snap negative reaction a veto, or something like that, if that’s possible.
I don’t think “if every option is bad, freeze in place paralyzed forever” is a good strategy for humans :-P and eventuality it would be a bad strategy for AGIs too, as you say.