I don’t know whether I agree with “strong coherence” or not because it seems vaguely defined. Can you give three examples of arguments that assume strong coherence? (Because the definition becomes crisper when applied than when theorwtical.)
Examples of strong coherence:
Assuming AI systems:
Are (expected) utility maximisers
Have (immutable) terminal goals
I think even Wentworth’s Subagents is predicated on an assumption of stronger coherence than obtains in practice. I think humans aren’t actually fully modeled as a fixed committee of agents making pareto optimal decisions with respect to their fixed utility functions. The ways in which I think Wentworth’s Subagents falls short are:
Human preferences change over time
Human preferences don’t have fixed weights, but activate to different degrees in particular contexts
And like I don’t think those particular features are necessarily just incoherencies. Malleable values are I think just a facet of generally intelligent systems in our universe.
I do think Subagents could be adapted/extended to model human preferences adequately, but the inherent inconsistency and context dependence of said preferences makes me think the agent model may be somewhat misleading.
Wait, why is it a lot of effort to reply? I’d have expected it to just involve considering the factors that made you endorse these memes and then pick one of the factors to dump as an example?
As for value, I think examples are valuable because they help grounding things and make it easier to provide alternate perspectives on things.
I don’t know whether I agree with “strong coherence” or not because it seems vaguely defined. Can you give three examples of arguments that assume strong coherence? (Because the definition becomes crisper when applied than when theorwtical.)
Examples of strong coherence: Assuming AI systems:
Are (expected) utility maximisers
Have (immutable) terminal goals
I think even Wentworth’s Subagents is predicated on an assumption of stronger coherence than obtains in practice. I think humans aren’t actually fully modeled as a fixed committee of agents making pareto optimal decisions with respect to their fixed utility functions. The ways in which I think Wentworth’s Subagents falls short are:
Human preferences change over time
Human preferences don’t have fixed weights, but activate to different degrees in particular contexts
And like I don’t think those particular features are necessarily just incoherencies. Malleable values are I think just a facet of generally intelligent systems in our universe.
See also: nostalgebraist’s “why assume AGIs will optimize for fixed goals?”.
And this comment.
I do think Subagents could be adapted/extended to model human preferences adequately, but the inherent inconsistency and context dependence of said preferences makes me think the agent model may be somewhat misleading.
E.g. Rob Bensinger suggested that agents may self modify to become more coherent over time.
Arguments that condition on strong coherence:
Deceptive alignment in mesa-optimisers
General failure modes from utility maximising
What are your preferred examples of this?
Can you link your preferred examples of this?
Meta note, that I dislike this style of engagement.
It feels like a lot of effort to reply to for me, with little additional value being provided by my reply/from the conversation.
Wait, why is it a lot of effort to reply? I’d have expected it to just involve considering the factors that made you endorse these memes and then pick one of the factors to dump as an example?
As for value, I think examples are valuable because they help grounding things and make it easier to provide alternate perspectives on things.
Replies are effortful in general, and it feels like I’m not really making much progress in the conversation for the effort I’m putting in.
🤷 Up to you. You were the one who tagged me in here. I can just ignore your post if you don’t feel like giving examples of what you are talking about.
But I will continue to ask for examples if you tag me in the future because I think examples are extremely valueable.
Fair enough.
I’ll try to come back to it later.