What if I push this line of thinking to the extreme. If I just pick agents randomly from the space of all agents, then this should be maximally random, and that should be even better. Now the part where we can mine information of alignment from the fact that humans are at least some what aligned is gone. So this seems wrong. What is wrong here? Probably the fact that if you pick agents randomly from the space of all agents, you don’t get greater variation of aliment, compare to if you pick random humans, because probably all the random agents you pick are just non aligned.
So what is doing most of the work here is that humans are more aligned than random. Which I expect you to agree on. What you are also saying (I think) is that the tale end level of alignment in humans is more important in some way than the mean or average level of aliment in humans. Because if we have the human distribution, we are just a few bits from locating the tail of the distribution. E.g. we are 10 bits away from locating the top 0.1 percentile. And because the tail is what matters, randomness is in our favor.
(Just typing as I think...)
What if I push this line of thinking to the extreme. If I just pick agents randomly from the space of all agents, then this should be maximally random, and that should be even better. Now the part where we can mine information of alignment from the fact that humans are at least some what aligned is gone. So this seems wrong. What is wrong here? Probably the fact that if you pick agents randomly from the space of all agents, you don’t get greater variation of aliment, compare to if you pick random humans, because probably all the random agents you pick are just non aligned.
So what is doing most of the work here is that humans are more aligned than random. Which I expect you to agree on. What you are also saying (I think) is that the tale end level of alignment in humans is more important in some way than the mean or average level of aliment in humans. Because if we have the human distribution, we are just a few bits from locating the tail of the distribution. E.g. we are 10 bits away from locating the top 0.1 percentile. And because the tail is what matters, randomness is in our favor.
Does this capture what you are tying to say?