That seems like a useful decomposition! Point 2 seems to beg the question, why does it assume that the brain can “ensure alignment with its values”, as opposed to, say, synthesizes an illusion of values by aggregating data from various shards?
Thanks for the comment. I take “beg the question” to mean “assumes its conclusion,” but it seems like you just mean Point 2 assumes something you disagree with, which is fair. I can see reasonable definitions of aligned and misaligned in which brains would fall into either category. For example, insofar as our values are a certain sort of evolutionary (e.g., valuing reproduction), human brains have misaligned mesaoptimization like craving sugar. If sugar craving itself is the value, then arguably we’re well-aligned.
In terms of synthesizing an illusion, what exactly would make it illusory? If the synthesis (i.e., combination of the various shards and associated data) is leading to brains going about their business in a not-catastrophic way (e.g., not being constantly insane or paralyzed), then that seems to meet the bar for alignment that many, particularly agent foundations proponents, favor. See, for example, Nate’s recent post:
Unfortunately, the current frontier for alignment research is “can we figure out how to point AGI at anything?”. By far the most likely outcome is that we screw up alignment and destroy ourselves.
The example I like is just getting an AI to fill a container of water, which human brains are able to do, but in Fantasia, the sorceror’s apprentice Mickey Mouse was not able to do! So that’s a basic sense in which brains are aligned, but again I’m not sure how exactly you would differentiate alignment with its values from synthesis of an illusion.
Shard Question: How does the human brain ensure alignment with its values, and how can we use that information to ensure the alignment of an AI with its designers’ values?
which does indeed beg the question in the standard meaning of it.
My point is that there is very much no alignment between different values! They are independent at best and contradictory in many cases. There is an illusion of coherent values that is a rationalization. The difference in values sometimes leads to catastrophic Fantasia-like outcomes on the margins (e.g. people with addiction don’t want to be on drugs but are), but most of the time it results in a mild akrasia (I am writing this instead of doing something that makes me money). This seems like a good analogy: http://max.mmlc.northwestern.edu/mdenner/Demo/texts/swan_pike_crawfish.htm
Hm, the begging question meaning is probably just a verbal dispute, but I don’t think asking questions can in general beg questions because they don’t have conclusions. There is no “assuming its conclusion is true” if there is no conclusion. Not a big deal though!
I wouldn’t say values are independent (i.e., orthogonal) at best; they are often highly correlated, such as values of “have enjoyable experiences” and “satisfy hunger” both leading to eating tasty meals. I agree they are often contradictory, and this is one valid model of catastrophic addiction or mild problems. I think any rigorous theory of “values” (shard theory or otherwise) will need to make sense of those phenomena, but I don’t see that as an issue for the claim “ensure alignment with its values” because I don’t think alignment requires complete satisfaction of every value, which is almost always impossible.
Hm. I think you can dissolve the perceived question-begging by replacing “values” with its substance:
How does the genome, in the presence of e.g. modern Western culture, reliably form decision-influences which push the person to e.g. take actions which increase the welfare of their family and friends? (i.e. where do friendship-shards come from?)
We’re then asking a relatively well-defined question with a guaranteed-to-exist answer.
That seems like a useful decomposition! Point 2 seems to beg the question, why does it assume that the brain can “ensure alignment with its values”, as opposed to, say, synthesizes an illusion of values by aggregating data from various shards?
Thanks for the comment. I take “beg the question” to mean “assumes its conclusion,” but it seems like you just mean Point 2 assumes something you disagree with, which is fair. I can see reasonable definitions of aligned and misaligned in which brains would fall into either category. For example, insofar as our values are a certain sort of evolutionary (e.g., valuing reproduction), human brains have misaligned mesaoptimization like craving sugar. If sugar craving itself is the value, then arguably we’re well-aligned.
In terms of synthesizing an illusion, what exactly would make it illusory? If the synthesis (i.e., combination of the various shards and associated data) is leading to brains going about their business in a not-catastrophic way (e.g., not being constantly insane or paralyzed), then that seems to meet the bar for alignment that many, particularly agent foundations proponents, favor. See, for example, Nate’s recent post:
The example I like is just getting an AI to fill a container of water, which human brains are able to do, but in Fantasia, the sorceror’s apprentice Mickey Mouse was not able to do! So that’s a basic sense in which brains are aligned, but again I’m not sure how exactly you would differentiate alignment with its values from synthesis of an illusion.
I meant this:
which does indeed beg the question in the standard meaning of it.
My point is that there is very much no alignment between different values! They are independent at best and contradictory in many cases. There is an illusion of coherent values that is a rationalization. The difference in values sometimes leads to catastrophic Fantasia-like outcomes on the margins (e.g. people with addiction don’t want to be on drugs but are), but most of the time it results in a mild akrasia (I am writing this instead of doing something that makes me money). This seems like a good analogy: http://max.mmlc.northwestern.edu/mdenner/Demo/texts/swan_pike_crawfish.htm
Hm, the begging question meaning is probably just a verbal dispute, but I don’t think asking questions can in general beg questions because they don’t have conclusions. There is no “assuming its conclusion is true” if there is no conclusion. Not a big deal though!
I wouldn’t say values are independent (i.e., orthogonal) at best; they are often highly correlated, such as values of “have enjoyable experiences” and “satisfy hunger” both leading to eating tasty meals. I agree they are often contradictory, and this is one valid model of catastrophic addiction or mild problems. I think any rigorous theory of “values” (shard theory or otherwise) will need to make sense of those phenomena, but I don’t see that as an issue for the claim “ensure alignment with its values” because I don’t think alignment requires complete satisfaction of every value, which is almost always impossible.
Hm. I think you can dissolve the perceived question-begging by replacing “values” with its substance:
We’re then asking a relatively well-defined question with a guaranteed-to-exist answer.