…but I thought the criterion was unconditional preference? The idea of nausea is precisely because agents can decide to act despite nausea, they’d just rather find a better solution (if their intelligence is up to the task).
Right; a preference being conditionally overwhelmed by other preferences does not make the presence of the overwhelmed preference conditional.
Or to phrase it another way, suppose I don’t like eating bread[1] (-1 utilons), but I do like eating cheese (100 utilons) and garlic (1000 utilons).
You ask me to choose between garlic bread (1000 − 1 = 999 utilons) and cheese (100 utilons); I pick the garlic bread.
The fact that I don’t like bread isn’t erased by the fact that I chose to eat garlic bread in this context.
It also seems to cover security (if we’re dead it won’t know), health (if we’re incapacitated it won’t know) and prosperity (if we’re under economical constraints that impacts our free will). But I’m interested to consider possible failure modes.
This is aiming at a different problem than goal agnosticism; it’s trying to come up with an agent that is reasonably safe in other ways.
In order for these kinds of bounds (curiosity, nausea) to work, they need to incorporate enough of the human intent behind the concepts.
So perhaps there is an interpretation of those words that is helpful, but there remains the question “how do you get the AI to obey that interpretation,” and even then, that interpretation doesn’t fit the restrictive definition of goal agnosticism.
The usefulness of strong goal agnostic systems (like ideal predictors) is that, while they do not have properties like those by default, they make it possible to incrementally implement those properties.
This is aiming at a different problem than goal agnosticism; it’s trying to come up with an agent that is reasonably safe in other ways.
Well, assuming a robust implementation, I still think it obeys your criterions, but now you mention « restrictive », my understanding is that you want this expression to specifically refers to pure predictors. Correct?
If yes, I’m not sure that’s the best choice for clarity (why not « pure predictors »?) but of course that’s your choice. If not, can you give some examples of goal agnostic agents other than pure predictors?
you mention « restrictive », my understanding is that you want this expression to specifically refers to pure predictors. Correct?
Goal agnosticism can, in principle, apply to things which are not pure predictors, and there are things which could reasonably be called predictors which are not goal agnostic.
A subset of predictors are indeed the most powerful known goal agnostic systems. I can’t currently point you toward another competitive goal agnostic system (rocks are uselessly goal agnostic), but the properties of goal agnosticism do, in concept, extend beyond predictors, so I leave the door open.
Also, by using the term “goal agnosticism” I try to highlight the value that arises directly from the goal-related properties, like statistical passivity and the lack of instrumental representational obfuscation. I could just try to use the more limited and implementation specific “ideal predictors” I’ve used before, but in order to properly specify what I mean by an “ideal” predictor, I’d need to specify goal agnosticism.
I’d be happy if you could point out a non competitive one, or explain why my proposal above does not obey your axioms. But we seem to get diminished returns to sort these questions out, so maybe it’s time to close at this point and wish you luck. Thanks for the discussion!
Right; a preference being conditionally overwhelmed by other preferences does not make the presence of the overwhelmed preference conditional.
Or to phrase it another way, suppose I don’t like eating bread[1] (-1 utilons), but I do like eating cheese (100 utilons) and garlic (1000 utilons).
You ask me to choose between garlic bread (1000 − 1 = 999 utilons) and cheese (100 utilons); I pick the garlic bread.
The fact that I don’t like bread isn’t erased by the fact that I chose to eat garlic bread in this context.
This is aiming at a different problem than goal agnosticism; it’s trying to come up with an agent that is reasonably safe in other ways.
In order for these kinds of bounds (curiosity, nausea) to work, they need to incorporate enough of the human intent behind the concepts.
So perhaps there is an interpretation of those words that is helpful, but there remains the question “how do you get the AI to obey that interpretation,” and even then, that interpretation doesn’t fit the restrictive definition of goal agnosticism.
The usefulness of strong goal agnostic systems (like ideal predictors) is that, while they do not have properties like those by default, they make it possible to incrementally implement those properties.
utterly false for the record
Well, assuming a robust implementation, I still think it obeys your criterions, but now you mention « restrictive », my understanding is that you want this expression to specifically refers to pure predictors. Correct?
If yes, I’m not sure that’s the best choice for clarity (why not « pure predictors »?) but of course that’s your choice. If not, can you give some examples of goal agnostic agents other than pure predictors?
Goal agnosticism can, in principle, apply to things which are not pure predictors, and there are things which could reasonably be called predictors which are not goal agnostic.
A subset of predictors are indeed the most powerful known goal agnostic systems. I can’t currently point you toward another competitive goal agnostic system (rocks are uselessly goal agnostic), but the properties of goal agnosticism do, in concept, extend beyond predictors, so I leave the door open.
Also, by using the term “goal agnosticism” I try to highlight the value that arises directly from the goal-related properties, like statistical passivity and the lack of instrumental representational obfuscation. I could just try to use the more limited and implementation specific “ideal predictors” I’ve used before, but in order to properly specify what I mean by an “ideal” predictor, I’d need to specify goal agnosticism.
I’d be happy if you could point out a non competitive one, or explain why my proposal above does not obey your axioms. But we seem to get diminished returns to sort these questions out, so maybe it’s time to close at this point and wish you luck. Thanks for the discussion!