what happens if this finds a way to satisfy values that the human actually has, but would not have if they had been able to do ELK on their own brain? eg, for example, I’m pretty sure I don’t want to want some things I want, and I’m worried about s-risks from the scaled version of locking in networks of conflicting things people currently truly want but truly wouldn’t want to truly want. eg, I’m pretty sure mine are milder than this, but some people truly want to hurt others in ways the other doesn’t want order to get ahead, and would resist any attempt to remove hurting others. given that these people can each have their own ai amplifier, what tool can be both probabilistically-verifiably trustable but also help both ai and human mutually discover ways to be aligned with others that neither could have discovered on their own?
I’ll be happy if AI gives people time/space/safety to figure out what they want while taking actions in the world that preserve option value.
The kind of AI alignment solution we’re working on isn’t a substitute for people deciding how they want to reflect and develop and decide what they value. The idea is that if AI is going to be part of that process, then the timing and nature of AI involvement should be decided by people rather than by “we need to deploy this AI now in order to remain competitive and accept whatever affects that has on our values.”
You could imagine AI solutions that try to replace the normal process of moral deliberation and reconciliation (rather than simply being a tool to help it), but I’ve never seen a proposal along those lines that didn’t seem really bad to me.
what happens if this finds a way to satisfy values that the human actually has, but would not have if they had been able to do ELK on their own brain? eg, for example, I’m pretty sure I don’t want to want some things I want, and I’m worried about s-risks from the scaled version of locking in networks of conflicting things people currently truly want but truly wouldn’t want to truly want. eg, I’m pretty sure mine are milder than this, but some people truly want to hurt others in ways the other doesn’t want order to get ahead, and would resist any attempt to remove hurting others. given that these people can each have their own ai amplifier, what tool can be both probabilistically-verifiably trustable but also help both ai and human mutually discover ways to be aligned with others that neither could have discovered on their own?
I’ll be happy if AI gives people time/space/safety to figure out what they want while taking actions in the world that preserve option value.
The kind of AI alignment solution we’re working on isn’t a substitute for people deciding how they want to reflect and develop and decide what they value. The idea is that if AI is going to be part of that process, then the timing and nature of AI involvement should be decided by people rather than by “we need to deploy this AI now in order to remain competitive and accept whatever affects that has on our values.”
You could imagine AI solutions that try to replace the normal process of moral deliberation and reconciliation (rather than simply being a tool to help it), but I’ve never seen a proposal along those lines that didn’t seem really bad to me.