So, I’ve begun writing a new post, “A dry presentation of some empirical evidence on DI’s effectiveness”. (An attempt to replace that intended function of my original post with as high-quality a replacement as Misha’s post was for the intended function of the ‘theory sketch’ section.)
KPier very kindly offered to help me with editing, so I sent her the first seven-ish paragraphs I had written. She found one change to recommend, somewhat ambivalent herself over which way was best. I wasn’t sure either, and found myself wondering what I’d decide in the end.
Then I started wondering about what differences in responses there might be between a post where she made all the final decisions, and a post where where I did.
And then I thought… double-blind experiment! (Woot woot, raise the empirical roof. :P)
Here’s my idea:
I finish writing the post, get the ‘her final’ and ‘my final’ versions, and then make a post linking to both versions and explaining the experiment.
I’ll just label them version A and version B (flip a coin to avoid any weird bias I may have on As and Bs, not that I’d anticipate much), and ask the reader to follow one or the other (by flipping a coin to avoid any weird bias they may have; Mostly just to make sure the sample sizes for each version are equalized.)
Then people record their impression and give me their feedback (without directly quoting the text), and I have to try and discriminate which readers got which.
Does that sound like a neat idea? If it works well, it seems like it might even end up being worth creating an automated system for setting up and running such experiments (without all the coin flipping and link following), for people to use with appropriate posts.
Luke did such a test recently. It’s probably useful for feedback (right now, his two version are at 20 and 3 karma), but really annoying for commenters. I would recommend getting some beta testers instead (I volunteer). Even a small sample of readers should be able to catch most relevant problems.
Thanks! I did think it sounded annoying for commenter, and I don’t want to try the general audience’s patience much further at this point. Hence why I’m just asking a few people what they think of it in the comments.
Being able to calibrate myself objectively is an extremely attractive idea, though.
Most excellent Gwern!
I have a proposition!
So, I’ve begun writing a new post, “A dry presentation of some empirical evidence on DI’s effectiveness”. (An attempt to replace that intended function of my original post with as high-quality a replacement as Misha’s post was for the intended function of the ‘theory sketch’ section.)
KPier very kindly offered to help me with editing, so I sent her the first seven-ish paragraphs I had written. She found one change to recommend, somewhat ambivalent herself over which way was best. I wasn’t sure either, and found myself wondering what I’d decide in the end.
Then I started wondering about what differences in responses there might be between a post where she made all the final decisions, and a post where where I did.
And then I thought… double-blind experiment! (Woot woot, raise the empirical roof. :P)
Here’s my idea:
I finish writing the post, get the ‘her final’ and ‘my final’ versions, and then make a post linking to both versions and explaining the experiment.
I’ll just label them version A and version B (flip a coin to avoid any weird bias I may have on As and Bs, not that I’d anticipate much), and ask the reader to follow one or the other (by flipping a coin to avoid any weird bias they may have; Mostly just to make sure the sample sizes for each version are equalized.)
Then people record their impression and give me their feedback (without directly quoting the text), and I have to try and discriminate which readers got which.
Does that sound like a neat idea? If it works well, it seems like it might even end up being worth creating an automated system for setting up and running such experiments (without all the coin flipping and link following), for people to use with appropriate posts.
Luke did such a test recently. It’s probably useful for feedback (right now, his two version are at 20 and 3 karma), but really annoying for commenters. I would recommend getting some beta testers instead (I volunteer). Even a small sample of readers should be able to catch most relevant problems.
Thanks! I did think it sounded annoying for commenter, and I don’t want to try the general audience’s patience much further at this point. Hence why I’m just asking a few people what they think of it in the comments.
Being able to calibrate myself objectively is an extremely attractive idea, though.
It’s been done before, but not often, so I infer it doesn’t work well. Possibly this is just due to clumsy implementation.