To the extent that I understand your position, it’s that sharing a lot of values doesn’t automatically imply that AI is safe/non-dystopian to your values if built, rather than saying that alignment is hard/impossible to someone’s values (note when I say that a model is aligned, I am always focused on aligning it to one person’s values).
Yes, with the caveat that I am not thereby saying that it’s not hard to align to even one person’s values.
I admittedly have a lot of agreement with you, and that’s despite thinking we can make machines that do follow orders/are intent-aligned ala Seth Herd’s definition:
Yes, with the caveat that I am not thereby saying that it’s not hard to align to even one person’s values.
Fair enough.
I admittedly have a lot of agreement with you, and that’s despite thinking we can make machines that do follow orders/are intent-aligned ala Seth Herd’s definition:
https://www.lesswrong.com/posts/7NvKrqoQgJkZJmcuD/instruction-following-agi-is-easier-and-more-likely-than