So… why does this post have such a low rating? Comments? I find it bewildering. If you’re interested in LessWrong, you should be interested in finding out under what conditions people become less wrong.
Posts with a lot of math require me to set aside larger chunks of time to consume them. I do want to examine this but that won’t be possible until later this week, which means I don’t vote on it until then.
What would it take to show that? It seems to me that isn’t a thing that I could “show”, even in theory, since I’ve found no existing empirical data on Aumann-agreement-type experiments in humans. If you know one, I’d appreciate a comment describing it.
I believe that one of the purposes of LessWrong is to help us gain an understanding of important epistemic issues. Proposing a new way to study the issue and potentially gain insight is therefore important.
I think that your standard implies that LessWrong is like a peer-reviewed journal: A place for people to present completed research programs; not a place for people to cooperate to find answers to difficult problems.
As I’ve said before, it’s not good to apply standards that penalize rigor. If the act of putting equations into a post means that each equation needs to be empirically validated in order to get an upvote, pretty soon nobody is going to put equations into their posts.
I’m perfectly happy to come back and vote this up after I am satisfied that it is good, and I haven’t and won’t vote it down. I think it’s a good idea to seek public comment, but the voting is supposed to indicate posts which are excellent for public consumption—this isn’t, unless it’s the technical first half of a pair of such posts. I want to know that the formalization parallels the reality, and it’s not clear that it does before it is run.
So, you don’t want to vote until you see the results; and I don’t want to waste an entire day writing up the results if few people are interested. Is there a general solution to this general problem?
(The “Part 1” in the title was supposed to indicate that it is the first part of a multi-part post.)
I held off on rating the post because I just skimmed it, saw most of it was describing an algorithm/model, decided I didn’t have time to check your working, and held off on rating the post because I didn’t check your work. I might not be representative, I don’t rate most posts; I’ve rated just 6 top-level posts so far this May.
Hmm—I wish I could see whether I have few upvotes, or numerous upvotes and downvotes. They’d have very different implications for what I should do differently.
Thanks for answering. This isn’t a continuation of the sleeping beauty debate. Despite what you see in the comment section, which has been hijacked by sleeping beauty.
One thing I think is missing from your model is correlation between different answers, and I think that this is actually essential to the phenomenon: ignoring it makes it look like people are failing to come to agreement at all, when what’s actually happening is that they’re aligning into various ideological groups.
That is, there’s a big difference between a group of 100 people with independent answers on 10 binary questions (random fair coinflips), and two groups of 50 who disagree on each of the 10 binary questions. I think that if you compared LW newcomers with veterans, you’d find that the newcomers more resemble the first case, and veterans more the second. This would suggest that peoples’ answers are becoming more internally coherent, at least.
In particular, I expect that on this subject the veterans split roughly as follows:
Those who subscribe to Bostrom’s SIA and are Thirders (1/3 to 1⁄2 of the LW vets)
Those who subscribe to Bostrom’s SSA and are Halfers (less than 1⁄4)
Those who reject Bostromian anthropic probabilities entirely (less than 1⁄4)
One can easily predict the responses of the first two groups on subsequent questions.
I don’t build a model by looking at the observed results of a phenomena, and building in a special component to produce each observed result. You wouldn’t learn anything from your models if you did that; they would produce what you built them to produce. I build a model by enumerating the inputs, modeling each input, and seeing how much of the observed results the output matches.
When I run the simulation, people do in fact align into different groups. So far, always 2 groups. But the alignment process doesn’t give either group better overall accuracy. This shows that you don’t need any internal coherence or problem understanding for people to align into groups. Attributing accuracy to people who tend to agree with you, and inaccuracy to those who disagree with you, produces saddle-point dynamics. Once the initial random distribution gets off the saddle point, the groups on the opposite sides each rapidly converge to their own attractor.
What’s especially interesting is that this way of judging people’s accuracy doesn’t just cause different groups to converge to different points; it causes the groups to disagree with each other on every point. There isn’t one “right” group and one “wrong” group; there are two groups that are right about different things. Their agreement within a group on some topics indirectly causes them to take the opposite opinion on any topic on which other groups have strong opinions. In other words: My enemy’s belief P is evidence against P.
In particular, I expect that on this subject the veterans split roughly as follows:
OK, I see what you’re doing now. It’s an interesting model, though one feature jumps out at me now:
In other words: My enemy’s belief P is evidence against P.
Although this phenomenon is a well-known fallacy among human beings, it doesn’t seem like it should be the rational behavior— and then I noticed that the probabilities p_i can be less than 1⁄2 in your model, and that some of your agents are in fact reliably anti-correct. This seems like a probable cause of a binary group split, if I’m understanding correctly.
What’s the result if you make the probabilities (and accordingly, people’s estimates of the probabilities) range from 1⁄2 to 1 instead of from 0 to 1?
What’s the result if you make the probabilities (and accordingly, people’s estimates of the probabilities) range from 1⁄2 to 1 instead of from 0 to 1?
Then everybody converges onto agreeing on the correct answer for every question. And you just answered the question as to why Bayesians should agree to agree: Because Bayesians can’t perform worse than random on average, their accuracies range from 1⁄2 to 1, and are not biased on any problem (unless the evidence is biased, in which case you’re screwed anyway). Averaging their opinions together will thus get the right answer to every (answerable) question. Congratulations! You win 1 Internet!
(The reason for choosing 0 to 1 is explained in the post.)
Although this phenomenon is a well-known fallacy among human beings, it doesn’t seem like it should be the rational behavior
The behavior in my model is rational if the results indicate that it gets the right answer. So far, it looks look it doesn’t.
some of your agents are in fact reliably anti-correct. This seems like a probable cause of a binary group split, if I’m understanding correctly.
You could probably get the same answer by having some problems, rather than agents, usually be answered wrong. An abundance of wrong answers makes the agents split. The agents don’t split into the correct agents and the incorrect agents, at least not for the conditions I’ve tested. There doubtless are settings that would get them to do that.
Does the 2-group split stay even if you continue the simulation until all answers have been revealed?
If you increase the standard deviation of p[i] so there are more very right and very wrong guessers, do they tend to split more into right and wrong groups? I expect they would.
Does the 2-group split stay even if you continue the simulation until all answers have been revealed?
Good question—no; revelation of answers eventually causes convergence into 1 group.
If you increase the standard deviation of p[i] so there are more very right and very wrong guessers, do they tend to split more into right and wrong groups? I expect they would.
It also didn’t get a lot of on-topic comments. Possibly because guessing the answers to your questions seems the wrong way to answer them—the correct way being to put it to the test with the program, which means rewriting it (wasteful) or waiting for you to post it.
Are you planning on posting the perl script? I’m a bit tempted to just translate what you’ve got in the post into python, but realistically I probably won’t get around to it anytime soon.
The code is written that way to accomodate the continuous case. I think people who aren’t C or assembly programmers will find the not(xor) more confusing; and people who are programmers will find the second unfamiliar.
So… why does this post have such a low rating? Comments? I find it bewildering. If you’re interested in LessWrong, you should be interested in finding out under what conditions people become less wrong.
Posts with a lot of math require me to set aside larger chunks of time to consume them. I do want to examine this but that won’t be possible until later this week, which means I don’t vote on it until then.
Thanks—good to know.
You haven’t shown that your experiment will do so. Nor have you shown that your experiment models the situation well.
What would it take to show that? It seems to me that isn’t a thing that I could “show”, even in theory, since I’ve found no existing empirical data on Aumann-agreement-type experiments in humans. If you know one, I’d appreciate a comment describing it.
I believe that one of the purposes of LessWrong is to help us gain an understanding of important epistemic issues. Proposing a new way to study the issue and potentially gain insight is therefore important.
I think that your standard implies that LessWrong is like a peer-reviewed journal: A place for people to present completed research programs; not a place for people to cooperate to find answers to difficult problems.
As I’ve said before, it’s not good to apply standards that penalize rigor. If the act of putting equations into a post means that each equation needs to be empirically validated in order to get an upvote, pretty soon nobody is going to put equations into their posts.
I’m perfectly happy to come back and vote this up after I am satisfied that it is good, and I haven’t and won’t vote it down. I think it’s a good idea to seek public comment, but the voting is supposed to indicate posts which are excellent for public consumption—this isn’t, unless it’s the technical first half of a pair of such posts. I want to know that the formalization parallels the reality, and it’s not clear that it does before it is run.
So, you don’t want to vote until you see the results; and I don’t want to waste an entire day writing up the results if few people are interested. Is there a general solution to this general problem?
(The “Part 1” in the title was supposed to indicate that it is the first part of a multi-part post.)
If you are confident in the practical value of your results, I would recommend posting. Otherwise I can’t help you.
I held off on rating the post because I just skimmed it, saw most of it was describing an algorithm/model, decided I didn’t have time to check your working, and held off on rating the post because I didn’t check your work. I might not be representative, I don’t rate most posts; I’ve rated just 6 top-level posts so far this May.
Hmm—I wish I could see whether I have few upvotes, or numerous upvotes and downvotes. They’d have very different implications for what I should do differently.
I’m rather tired of the Sleeping Beauty debate and so didn’t read it. If others have had the same reaction this might explain the low score.
Thanks for answering. This isn’t a continuation of the sleeping beauty debate. Despite what you see in the comment section, which has been hijacked by sleeping beauty.
One thing I think is missing from your model is correlation between different answers, and I think that this is actually essential to the phenomenon: ignoring it makes it look like people are failing to come to agreement at all, when what’s actually happening is that they’re aligning into various ideological groups.
That is, there’s a big difference between a group of 100 people with independent answers on 10 binary questions (random fair coinflips), and two groups of 50 who disagree on each of the 10 binary questions. I think that if you compared LW newcomers with veterans, you’d find that the newcomers more resemble the first case, and veterans more the second. This would suggest that peoples’ answers are becoming more internally coherent, at least.
In particular, I expect that on this subject the veterans split roughly as follows:
Those who subscribe to Bostrom’s SIA and are Thirders (1/3 to 1⁄2 of the LW vets)
Those who subscribe to Bostrom’s SSA and are Halfers (less than 1⁄4)
Those who reject Bostromian anthropic probabilities entirely (less than 1⁄4)
One can easily predict the responses of the first two groups on subsequent questions.
I don’t build a model by looking at the observed results of a phenomena, and building in a special component to produce each observed result. You wouldn’t learn anything from your models if you did that; they would produce what you built them to produce. I build a model by enumerating the inputs, modeling each input, and seeing how much of the observed results the output matches.
When I run the simulation, people do in fact align into different groups. So far, always 2 groups. But the alignment process doesn’t give either group better overall accuracy. This shows that you don’t need any internal coherence or problem understanding for people to align into groups. Attributing accuracy to people who tend to agree with you, and inaccuracy to those who disagree with you, produces saddle-point dynamics. Once the initial random distribution gets off the saddle point, the groups on the opposite sides each rapidly converge to their own attractor.
What’s especially interesting is that this way of judging people’s accuracy doesn’t just cause different groups to converge to different points; it causes the groups to disagree with each other on every point. There isn’t one “right” group and one “wrong” group; there are two groups that are right about different things. Their agreement within a group on some topics indirectly causes them to take the opposite opinion on any topic on which other groups have strong opinions. In other words: My enemy’s belief P is evidence against P.
(Sleeping Beauty isn’t the subject of this post.)
OK, I see what you’re doing now. It’s an interesting model, though one feature jumps out at me now:
Although this phenomenon is a well-known fallacy among human beings, it doesn’t seem like it should be the rational behavior— and then I noticed that the probabilities p_i can be less than 1⁄2 in your model, and that some of your agents are in fact reliably anti-correct. This seems like a probable cause of a binary group split, if I’m understanding correctly.
What’s the result if you make the probabilities (and accordingly, people’s estimates of the probabilities) range from 1⁄2 to 1 instead of from 0 to 1?
Then everybody converges onto agreeing on the correct answer for every question. And you just answered the question as to why Bayesians should agree to agree: Because Bayesians can’t perform worse than random on average, their accuracies range from 1⁄2 to 1, and are not biased on any problem (unless the evidence is biased, in which case you’re screwed anyway). Averaging their opinions together will thus get the right answer to every (answerable) question. Congratulations! You win 1 Internet!
(The reason for choosing 0 to 1 is explained in the post.)
The behavior in my model is rational if the results indicate that it gets the right answer. So far, it looks look it doesn’t.
You could probably get the same answer by having some problems, rather than agents, usually be answered wrong. An abundance of wrong answers makes the agents split. The agents don’t split into the correct agents and the incorrect agents, at least not for the conditions I’ve tested. There doubtless are settings that would get them to do that.
Does the 2-group split stay even if you continue the simulation until all answers have been revealed?
If you increase the standard deviation of p[i] so there are more very right and very wrong guessers, do they tend to split more into right and wrong groups? I expect they would.
Good question—no; revelation of answers eventually causes convergence into 1 group.
It makes the splitting happen faster.
It also didn’t get a lot of on-topic comments. Possibly because guessing the answers to your questions seems the wrong way to answer them—the correct way being to put it to the test with the program, which means rewriting it (wasteful) or waiting for you to post it.
Are you planning on posting the perl script? I’m a bit tempted to just translate what you’ve got in the post into python, but realistically I probably won’t get around to it anytime soon.
I think there’s a way to upload it to LessWrong and post a link to it. But I don’t know how. My email is at gmail.
Summarizing the results in the same post would result in a gigantic post that people wouldn’t want to read.
The code could be cleaner. Couldn’t
be
or
same givt gjvs
It would clean up the code a lot, and make it less of a hassle to read. I’d also prefer higher order functions to for loops, but that may just be me.
The code is written that way to accomodate the continuous case. I think people who aren’t C or assembly programmers will find the not(xor) more confusing; and people who are programmers will find the second unfamiliar.
I’m mainly saying the code is a bit opaque at the moment.
If you want to keep the continuous case, fine.
As long as you defined the same or similar function somewhere else, programmers would be fine.
Commenting the code would help people get to grips with it, if you don’t want to change it.
Good idea. Comments it is.