For more complicated propositions, who does the math and statistics? The application apparently gathers the data, but it is still subject to interpretation.
Is the data (presumably anonymized) made publicly available, so that others can dispute the meaning?
If the sponsoring company does its own math and stats, must it publicly post its working papers before making claims based on the data? Does anyone review that to make sure it passes some light smell test, and isn’t just pictures of cats?
What action does the organization behind the app take if a sponsor publicly misrepresents the data or, more likely, its meaning? If the organization would take action, does it take the same action if the statement is merely misleading, rather than factually incorrect?
What do the participants get? Is that simply up to the sponsor? If so, who reviews it to assure that the incentive does not distort the data? If no one, will you at least require that the incentive be reported as part of the trial?
Does a sponsor have any recourse if it designed the trial badly, leading to misleading results? Or is its remedy really to design a better trial and publicize that one?
Can sponsors do a private mini-trial to test its trial design before going full bore (presumably, with their promise not to publicize the results)?
Have you considered some form of reputation system, allowing commenters to build a reputation for debunking badly supported claims and affirming well-supported claims? (Or perhaps some other goodie?) I can imagine it becoming a pastime for grad students, which would be a Good Thing (TM).
I imagine these might all be very basic questions that arise out of my ignorance of such studies. If so, please spend your time on people with more to contribute than ignorance!
7 - Can sponsors do a private mini-trial to test its trial design before going full bore (presumably, with their promise not to publicize the results)?
This is an awesome idea. I had not considered this until you posted it. This sounds great.
6 - Does a sponsor have any recourse if it designed the trial badly, leading to misleading results? Or is its remedy really to design a better trial and publicize that one?
This is a hard one. I anticipate that at least initially only Good People will be using this protocol. These are people who spent a lot of time creating something to (hopefully) make the world better. Not cool to screw them if they make a mistake, or if v1 isn’t as awesome as anticipated.
A related question is: what can we do to help a company that has demonstrated its effectiveness?
3 - If the sponsoring company does its own math and stats, must it publicly post its working papers before making claims based on the data? Does anyone review that to make sure it passes some light smell test, and isn’t just pictures of cats?
At minimum the code used should be posted publicly and open-source licensed (otherwise there can be no scrutiny or replication). I also think paying to have a third party review the code isn’t unreasonable.
2 - Is the data (presumably anonymized) made publicly available, so that others can dispute the meaning?
That was the initial plan, yes! Beltran (my co-founder at GB) is worried that will result in either HIPPA issues or something like this, so I’m ultimately unsure. Putting structures in place so the science is right the first time seems better.
It makes sense to guarantee anonymity. Participants recruited personally by company founders may be otherwise unwilling to report honestly (for example). For health related studies, privacy is an issue for insurance reasons, etc.
However, for follow-up studies, it seems important to keep earlier records including personally identifiable information so as to prevent repeatedly sampling from the same population.
That would imply that your organization/system needs to have a data management system for securely storing the personal data while making it available in an anonymized form.
However, there are privacy risks associated with ‘anonymized’ data as well, since this data can sometimes be linked with other data sources to make inferences about participants. (For example, if participants provide a zip code and certain demographic information, that may be enough to narrow it down to a very few people.) You may want to consider differential privacy solutions or other kinds of data perturbation.
8 - Have you considered some form of reputation system, allowing commenters to build a reputation for debunking badly supported claims and affirming well-supported claims? (Or perhaps some other goodie?) I can imagine it becoming a pastime for grad students, which would be a Good Thing (TM).
I hadn’t. I like the idea, but am less able to visualize it than the rest of this stuff. Grad students cleaning up marketing claims does indeed sound like a Good Thing...
I was thinking something like the karma score here. People could comment on the data and the math that leads to the conclusions, and debunk the ones that are misleading. A problem would be that, If you allow endorsers, rather than just debunkers, you could get in a situation where a sponsor pays people to publicly accept the conclusions. Here are my thoughts on how to avoid this.
First, we have to simplify the issue down to a binary question: does the data fairly support the conclusion that the sponsor claims? The sponsor would offer $x for each of the first Y reviewers with a reputation score of at least Z. They have to pay regardless of what the reviewer’s answer to the question is. If the reviewers are unanimous, then they all get small bumps to their reputation. If they are not unanimous, then they see each others’ reviews (anonymously and non-publicly at this point) and can change their positions one time. After that, those who are in the final majority and did not change their position get a bump up in reputation, but only based on the number of reviewers who switched to be in the final majority. (I.e. we reward reviewers who persuade others to change their position.) The reviews are then opened to a broader number of people with positive reputations, who can simply vote yes or no, which again affects the reputations of the reviewers. Again, voting is private until complete, then people who vote with the majority get small reputation bumps. At the conclusion of the process, everyone’s work is made public.
I’m sure that there are people who have thought about reputation systems more than I have. But I have mostly seen reputation systems as a mechanism for creating a community where certain standards are upheld in the absence of monetary incentives. A reputation system that is robust against gaming seems difficult.
I’m very glad I asked for more clarification. I’m going to call this system The Reviewer’s Dilemma, it’s a very interesting solution for allowing non-software analysis to occur in a trusted manner. I am somewhat worried about a laziness bias (it’s much easier to agree than disprove), but I imagine that there is a similar bounty for overturning previous results this might be handled.
I’ll do a little customer development with some friends, but the possibility of reviewers being added as co-authors might also act as a nice incentive (both to reduce laziness, and as addition compensation).
5 - What do the participants get? Is that simply up to the sponsor? If so, who reviews it to assure that the incentive does not distort the data? If no one, will you at least require that the incentive be reported as part of the trial?
We need to design rules governing participant compensation.
At a minimum I think all compensation should be reported (it’s part of what’s needed for replication), and of course not related to the results a participant reports. Ideally we create a couple defined protocols for locating participants, and people largely choose to go with a known good solution.
StackOverflow et al are also free and offer no compensation except for points and awards and reputation. Maybe it can be combined. Points for regular participation, prominent mention somewhere and awards being real rewards. The downside is that this may pose moral hazards of some kind.
I had been assuming that participants needed to be drawn from the general population. If we don’t think there’s too much hazard there, I agree a points system would work. Some portion of the population would likely just enjoy the idea of receiving free product to test.
For studies in which people have to actively involve themselves and consent to participate, I believe that there is always going to be some sampling bias. At best we can make it really really small, at worst, we should state clearly what we believe are those biases in our population.
At worst, we will have a better understanding of what goes into the results.
Also, for some studies, the sampled population might, by necessity, be a subset of the population.
4 - What action does the organization behind the app take if a sponsor publicly misrepresents the data or, more likely, its meaning? If the organization would take action, does it take the same action if the statement is merely misleading, rather than factually incorrect?
I imagined similar actions as the Free Software Foundation takes when a company violates the GPL: basically a lawsuit and press release warning people. For template studies, ideally what claims can be made would be specified by the template (ie “Our users lost XY more pounds over Z time”.)
One option is simply to report it to the Federal Trade Commission for investigation, along with a negative publicity statement. That externalizes the cost.
If you would like assistance drafting the agreements, I am a lawyer and would be happy to help. I have deep knowledge about technology businesses, intellectual property licensing, and contracting, mid-level knowledge about data privacy, light knowledge about HIPAA, and no knowledge about medical testing or these types of protocols. I’m also more than fully employed, so you’d have the constraint of taking the time I could afford to donate.
FTC is so much better than lawsuit. I don’t know a single advertiser that isn’t afraid of the FTC. It looks like enforcement is tied to complaint numbers, so the press release should include information about how to personally complain (and go out to a mailing list as well).
I would love assistance with the agreements. It sounds like you would be more suited to the Business <> Non-Profit agreements than the Participant <> Business agreements. How do I maximize the value of your contribution? Are you more suited to the high-level term sheet, or the final wording?
1 - For more complicated propositions, who does the math and statistics? The application apparently gathers the data, but it is still subject to interpretation.
This problem can be reduced in size by having the webapp give out blinded data, and only reveal group names after the analysis has been publicly committed to. If participating companies are unhappy with the existing modules, they could perhaps hire “statistical consultants” to add a module, permanently improving the site for everyone.
I think I get your meaning. You mean that the webapp itself would carry out the testing protocol. I was thinking that it would be designed by the sponsor using standardized components. I think what you are saying is that it would be more rigid than that. This would allow much more certainty in the meaning of the result. Your example of “using X resulted in average weight loss of Y compared to a control group” would be a case that could be standardized, where “average weight loss” is a configurable data element.
Yes. I think if we can manage it, requiring data-analysis to be pre-declared is just better. I don’t think science as a whole can do this, because not all data is as cheap to produce as product testing data.
Now that I’ve heard your reply to question #8, I need to consider this again. Perhaps we could have some basic claims done by software, while allowing for additional claims such as “those over 50 show twice the results” to be verified by grad students. I will think about this.
Thank you! This is exactly the kind of discussion I was hoping for.
The general answer to your questions is: I want to build whatever LessWrong wants me to build. If it’s debated in the open, and agreed as the least-worst option, that’s the plan.
I’ll post answers to each question in a separate thread, since they raise a lot of questions I was hoping for feedback on.
Thanks for the example. It leads me to questions:
For more complicated propositions, who does the math and statistics? The application apparently gathers the data, but it is still subject to interpretation.
Is the data (presumably anonymized) made publicly available, so that others can dispute the meaning?
If the sponsoring company does its own math and stats, must it publicly post its working papers before making claims based on the data? Does anyone review that to make sure it passes some light smell test, and isn’t just pictures of cats?
What action does the organization behind the app take if a sponsor publicly misrepresents the data or, more likely, its meaning? If the organization would take action, does it take the same action if the statement is merely misleading, rather than factually incorrect?
What do the participants get? Is that simply up to the sponsor? If so, who reviews it to assure that the incentive does not distort the data? If no one, will you at least require that the incentive be reported as part of the trial?
Does a sponsor have any recourse if it designed the trial badly, leading to misleading results? Or is its remedy really to design a better trial and publicize that one?
Can sponsors do a private mini-trial to test its trial design before going full bore (presumably, with their promise not to publicize the results)?
Have you considered some form of reputation system, allowing commenters to build a reputation for debunking badly supported claims and affirming well-supported claims? (Or perhaps some other goodie?) I can imagine it becoming a pastime for grad students, which would be a Good Thing (TM).
I imagine these might all be very basic questions that arise out of my ignorance of such studies. If so, please spend your time on people with more to contribute than ignorance!
Max L.
This is an awesome idea. I had not considered this until you posted it. This sounds great.
This is a hard one. I anticipate that at least initially only Good People will be using this protocol. These are people who spent a lot of time creating something to (hopefully) make the world better. Not cool to screw them if they make a mistake, or if v1 isn’t as awesome as anticipated.
A related question is: what can we do to help a company that has demonstrated its effectiveness?
This is exactly the moral hazard companies face with the normal procedure too.
The main advantage I see is that the webapp approach is much cheeper allowing companies t do it early thus reducing the moral hazard.
At minimum the code used should be posted publicly and open-source licensed (otherwise there can be no scrutiny or replication). I also think paying to have a third party review the code isn’t unreasonable.
That was the initial plan, yes! Beltran (my co-founder at GB) is worried that will result in either HIPPA issues or something like this, so I’m ultimately unsure. Putting structures in place so the science is right the first time seems better.
The privacy issue here is interesting.
It makes sense to guarantee anonymity. Participants recruited personally by company founders may be otherwise unwilling to report honestly (for example). For health related studies, privacy is an issue for insurance reasons, etc.
However, for follow-up studies, it seems important to keep earlier records including personally identifiable information so as to prevent repeatedly sampling from the same population.
That would imply that your organization/system needs to have a data management system for securely storing the personal data while making it available in an anonymized form.
However, there are privacy risks associated with ‘anonymized’ data as well, since this data can sometimes be linked with other data sources to make inferences about participants. (For example, if participants provide a zip code and certain demographic information, that may be enough to narrow it down to a very few people.) You may want to consider differential privacy solutions or other kinds of data perturbation.
http://en.wikipedia.org/wiki/Differential_privacy
I hadn’t. I like the idea, but am less able to visualize it than the rest of this stuff. Grad students cleaning up marketing claims does indeed sound like a Good Thing...
I was thinking something like the karma score here. People could comment on the data and the math that leads to the conclusions, and debunk the ones that are misleading. A problem would be that, If you allow endorsers, rather than just debunkers, you could get in a situation where a sponsor pays people to publicly accept the conclusions. Here are my thoughts on how to avoid this.
First, we have to simplify the issue down to a binary question: does the data fairly support the conclusion that the sponsor claims? The sponsor would offer $x for each of the first Y reviewers with a reputation score of at least Z. They have to pay regardless of what the reviewer’s answer to the question is. If the reviewers are unanimous, then they all get small bumps to their reputation. If they are not unanimous, then they see each others’ reviews (anonymously and non-publicly at this point) and can change their positions one time. After that, those who are in the final majority and did not change their position get a bump up in reputation, but only based on the number of reviewers who switched to be in the final majority. (I.e. we reward reviewers who persuade others to change their position.) The reviews are then opened to a broader number of people with positive reputations, who can simply vote yes or no, which again affects the reputations of the reviewers. Again, voting is private until complete, then people who vote with the majority get small reputation bumps. At the conclusion of the process, everyone’s work is made public.
I’m sure that there are people who have thought about reputation systems more than I have. But I have mostly seen reputation systems as a mechanism for creating a community where certain standards are upheld in the absence of monetary incentives. A reputation system that is robust against gaming seems difficult.
Max L.
I’m very glad I asked for more clarification. I’m going to call this system The Reviewer’s Dilemma, it’s a very interesting solution for allowing non-software analysis to occur in a trusted manner. I am somewhat worried about a laziness bias (it’s much easier to agree than disprove), but I imagine that there is a similar bounty for overturning previous results this might be handled.
I’ll do a little customer development with some friends, but the possibility of reviewers being added as co-authors might also act as a nice incentive (both to reduce laziness, and as addition compensation).
We need to design rules governing participant compensation.
At a minimum I think all compensation should be reported (it’s part of what’s needed for replication), and of course not related to the results a participant reports. Ideally we create a couple defined protocols for locating participants, and people largely choose to go with a known good solution.
StackOverflow et al are also free and offer no compensation except for points and awards and reputation. Maybe it can be combined. Points for regular participation, prominent mention somewhere and awards being real rewards. The downside is that this may pose moral hazards of some kind.
Oh, interesting.
I had been assuming that participants needed to be drawn from the general population. If we don’t think there’s too much hazard there, I agree a points system would work. Some portion of the population would likely just enjoy the idea of receiving free product to test.
I would worry about sampling bias due to selection based on, say, enjoying points.
For studies in which people have to actively involve themselves and consent to participate, I believe that there is always going to be some sampling bias. At best we can make it really really small, at worst, we should state clearly what we believe are those biases in our population.
At worst, we will have a better understanding of what goes into the results.
Also, for some studies, the sampled population might, by necessity, be a subset of the population.
I imagined similar actions as the Free Software Foundation takes when a company violates the GPL: basically a lawsuit and press release warning people. For template studies, ideally what claims can be made would be specified by the template (ie “Our users lost XY more pounds over Z time”.)
One option is simply to report it to the Federal Trade Commission for investigation, along with a negative publicity statement. That externalizes the cost.
If you would like assistance drafting the agreements, I am a lawyer and would be happy to help. I have deep knowledge about technology businesses, intellectual property licensing, and contracting, mid-level knowledge about data privacy, light knowledge about HIPAA, and no knowledge about medical testing or these types of protocols. I’m also more than fully employed, so you’d have the constraint of taking the time I could afford to donate.
Max L.
FTC is so much better than lawsuit. I don’t know a single advertiser that isn’t afraid of the FTC. It looks like enforcement is tied to complaint numbers, so the press release should include information about how to personally complain (and go out to a mailing list as well).
I would love assistance with the agreements. It sounds like you would be more suited to the Business <> Non-Profit agreements than the Participant <> Business agreements. How do I maximize the value of your contribution? Are you more suited to the high-level term sheet, or the final wording?
This problem can be reduced in size by having the webapp give out blinded data, and only reveal group names after the analysis has been publicly committed to. If participating companies are unhappy with the existing modules, they could perhaps hire “statistical consultants” to add a module, permanently improving the site for everyone.
This could be related to your #8 as well :)
I think I get your meaning. You mean that the webapp itself would carry out the testing protocol. I was thinking that it would be designed by the sponsor using standardized components. I think what you are saying is that it would be more rigid than that. This would allow much more certainty in the meaning of the result. Your example of “using X resulted in average weight loss of Y compared to a control group” would be a case that could be standardized, where “average weight loss” is a configurable data element.
Max L.
Yes. I think if we can manage it, requiring data-analysis to be pre-declared is just better. I don’t think science as a whole can do this, because not all data is as cheap to produce as product testing data.
Now that I’ve heard your reply to question #8, I need to consider this again. Perhaps we could have some basic claims done by software, while allowing for additional claims such as “those over 50 show twice the results” to be verified by grad students. I will think about this.
Thank you! This is exactly the kind of discussion I was hoping for.
The general answer to your questions is: I want to build whatever LessWrong wants me to build. If it’s debated in the open, and agreed as the least-worst option, that’s the plan.
I’ll post answers to each question in a separate thread, since they raise a lot of questions I was hoping for feedback on.