That is a fair point! I don’t think Zvi et. al are obligated and I’m not like, going to call them fraudster hacks if they’re not interested.
I said this more with the hopes that people frustrated with unaccountable governance would want to seize the mantle of personal responsibility, to show everyone that they are pure and incorruptible and it can be done. My post came across as more of a demand than I meant it to, which I apologize for.
Organizations can distribute their money how they want. My concern here is more “can pillars of the rat community get funding for crappy ideas on the basis of being pillars and having a buddy in grantmaking?” I want to judge EA orgs on their merits and I want to judge Zvi on his merits. If Balsa flops, who do we give less money?
Zvi said on his substack that he would consider this a worthwhile venture if there were a 2% chance of achieving a major federal policy goal. Are there lesser goals that Zvi thinks they can hit at 50% or 90%? If not, then okay. Sometimes that is just how it is and you have to do the low probability, high EV thing. But even if it’s just the 2% thing, I would like Brier scores to update.
So the other concern is track record legibility. There is a lot of deferral among rats, some of it even necessary. Not every person can be a machine learning person. I’ve been reading LW for eight years and plenty of what Vance and Zvi write, but only heard of MetaMed a few months ago looking at Vance’s LinkedIn.
Searching it up on the forums got very thin results. EY endorsed it strongly (which I believe counts as a ding on his track record if anyone is maintaining that anywhere), Alexander advertised it but remained neutral as to whether it was a good idea. So this was a big thing that the community was excited about—and it turned to shit. I believe it turned to shit without enough discussion in the aftermath of why, of what premises people had wrong. I have read the post mortems and found them lacking.
“Can you run a business well?” doesn’t say much about someone’s epistemics, but “Can you identify the best interventions with which to make use of your time?” absolutely does and “Can you win?” absolutely does and the way to see that is how the project empirically performs. This is a fallible test: you can do a good job at the identification and just suck at the business or just be unlucky, but I’m still going to update towards someone being untrustworthy or incompetent based on it.
Other good reasons not to do this: It is extremely plausible that making all your goals legible is inhibitive to policy work. A solution to that might be timed cryptography or an independent keeping track of their goals and reporting the results of the predictions sans what they were predicting. I am aware that this is a non-trivial inconvenience and would respect the founders considerably more if they went for it.
I am also keenly aware that this is a demand for rigor more isolated than the ice caps. I recognize the failure mode where you demand everyone wears their marks on their sleeve, but in practice only the black ones seem to stick over time. I think that’s really bad because then you end up cycling veterans out and replacing them with new people who are no better or worse. Hopefully we can manage to not end up there.
I think I am much more ambivalent than I sounded in my first post, but I wanted to discuss this. Hopefully it doesn’t cause anyone undue stress.
EY endorsed it strongly (which I believe counts as a ding on his track record if anyone is maintaining that anywhere)
I don’t think it’s a ding on his track record.
He tried a product.
It worked shockingly well for him.
He recommended others use that product.
This is a basic prosocial act. You haven’t made an argument that the product was low-quality, the failure of the company only shows that there wasn’t enough of a market for that particular product to sustain the company. For the most part I’m glad Eliezer advertised it while I could still buy it, it seems like the product was pretty great (though very expensive).
For context, here is an advert he gave for them, which has no endorsement of funding the company, nor says the organization is well run, and entirely focuses on his experience of the product (with the exception of one parenthetical).
You know Harry’s non-24 sleep disorder? I have that. Normally my days are around 24 hours and 30 minutes long.
Around a year ago, some friends of mine cofounded MetaMed, intended to provide high-grade analysis of the medical literature for people with solution-resistant medical problems. (I.e. their people know Bayesian statistics and don’t automatically believe every paper that claims to be ‘statistically significant’ – in a world where only 20-30% of studies replicate, they not only search the literature, but try to figure out what’s actually true.) MetaMed offered to demonstrate by tackling the problem of my ever-advancing sleep cycle.
Here’s some of the things I’ve previously tried:
Taking low-dose melatonin 1-2 hours before bedtime
Using timed-release melatonin
Installing red lights (blue light tells your brain not to start making melatonin)
Using blue-blocking sunglasses after sunset
Wearing earplugs
Using a sleep mask
Watching the sunrise
Watching the sunset
Blocking out all light from the windows in my bedroom using aluminum foil, then lining the door-edges with foam to prevent light from slipping in the cracks, so I wouldn’t have to use a sleep mask
Spending a total of ~$2200 on three different mattresses (I cannot afford the high-end stuff, so I tried several mid-end ones)
Trying 4 different pillows, including memory foam, and finally settling on a folded picnic blanket stuffed into a pillowcase (everything else was too thick)
Putting 2 humidifiers in my room, a warm humidifier and a cold humidifier, in case dryness was causing my nose to stuff up and thereby diminish sleep quality
Buying an auto-adjusting CPAP machine for $650 off Craigslist in case I had sleep apnea. ($650 is half the price of the sleep study required to determine if you need a CPAP machine.)
Taking modafinil and R-modafinil.
Buying a gradual-light-intensity-increasing, sun alarm clock for ~$150
Not all of this was futile – I kept the darkened room, the humidifiers, the red lights, the earplugs, and one of the mattresses; and continued taking the low-dose and time-release melatonin. But that didn’t prevent my sleep cycle from advancing 3 hours per week (until my bedtime was after sunrise, whereupon I would lose several days to staying awake until sunset, after which my sleep cycle began slowly advancing again).
MetaMed produced a long summary of extant research on non-24 sleep disorder, which I skimmed, and concluded by saying that – based on how the nadir of body temperature varies for people with non-24 sleep disorder and what this implied about my circadian rhythm – their best suggestion, although it had little or no clinical backing, was that I should take my low-dose melatonin 5-7 hours before bedtime, instead of 1-2 hours, a recommendation which I’d never heard anywhere before.
And it worked.
I can’t *#&$ing believe that #*$%ing worked.
(EDIT in response to reader questions: “Low-dose” melatonin is 200microgram (mcg) = 0.2 mg. Currently I’m taking 0.2mg 5.5hr in advance, and taking 1mg timed-release just before closing my eyes to sleep. However, I worked up to that over time – I started out just taking 0.3mg total, and I would recommend to anyone else that they start at 0.2mg.)
Sticker shock warning: MetaMed’s charge for an analysis starts at $5K, or around double the cost of everything else I tried put together – it’s either for people who have money, or people who have resistant serious problems. (Of course MetaMed dreams of eventually converting all of medicine to a saner footing, but right now they have to charge significant amounts to initial customers.) And by the nature of MetaMed’s task, results are definitely not guaranteed – but it worked for me.
...there’s also MetaMed. Instead of just having “evidence-based medicine” in journals that doctors don’t actually read, MetaMed will provide you with actual evidence-based healthcare.
You’re right that he doesn’t make any specific verifiable claims so much as be very glowing and excited. It does still make me less inclined to trust his predictive ability (or trust him, depending on how much is him believing in that stuff vs building up hype for whatever reason.)
I do think this ad doesn’t line up with what you said re: “[...] nor says the organization is well run, and entirely focuses on his experience of the product (with the exception of one parenthetical).”
As I understand it, you’re updating against his recommendations of a product by his friends being strong evidence that the company won’t later go out of business. This seems fine to me.
I’m saying that his endorsement of the product seems eminently reasonable to me, that it was indeed life-changing for him on a level that very few products ever are, and that in general with that kind of information about a product, I don’t think he made any errors of judgment, and acted pro-socially.
I will continue to take his product advice strongly, but I will not expect that just because a company is run by rationalists or that Eliezer endorses the product, that this is especially strong evidence that they will succeed on the business fundamentals.
I think you were mistaken to call it a “ding on his track record” because he did not endorse investing in the company, he endorsed using the product, and this seems like the right epistemic state to me. From the evidence I have about MetaMed, I would really want to have access to their product.
As an example, if he’d written a post called “Great Investment Opportunity: MetaMed” this would be a ding on his track record. Instead he wrote a post called “MetaMed: Evidence-Based Healthcare”, and this seems accurate and to be a positive sign about his track record of product-recommendations.
This is a new service and it has to interact with the existing medical system, so they are currently expensive, starting at $5,000 for a research report. (Keeping in mind that a basic report involves a lot of work by people who must be good at math.)
Unrelatedly but from the same advert. I had not realized it was that expensive—this rings some alarm bells for me but maybe it is fine, it is in fact a medical service. I have been waffling back and forth and will conclude I don’t know enough of the details.
Regardless, the alarm bells still made me want to survey the comments and see if anyone else was alarmed. Summaries of the comments by top level:
> The words “evidence-based medicine” seems to imply “non evidence-based medicine”
> Will MetaMed make its research freely available?
> Proposals re: the idea that MetaMed might not improve the world save for their clients
> You should disclose that MIRI shares sponsors with MetaMed, detail question
> Please send this to the front page!
> I’m overall not impressed, here are a couple criticisms, what does MetaMed have over uptodate.com in terms of comparative advantage? (Nice going user EHeller, have some Bayes points.)
> Discussion of doctors and their understanding of probability
> MetaMed has gone out of business (3 years later)
> Is MetaMed a continuation of a vanished company called Personalized Medicine?
> A friend of mine has terrible fibromyalgia and would pay 5k for relief but not for a literature search of unknown benefit. I guess she’s not the target audience? (long thread, MetaMed research is cited, EHeller again disputes its value compared to less expensive sources)
> An aside on rat poison
> How might MetaMed and IBM Watson compare and contrast?
> Error in advert: Jaan Tallinn is not the CEO but chairman, Zvi is the CEO.
> Is MetaMed LW-y enough that we should precommit to updating by prespecified amounts on the effectiveness of LW rationality in response to its successes and failures?
There I will cut off because the last commentor is after my own heart. Gwern responds by saying:
At a first glance, I’m not sure humans can update by prespecified amounts, much less prespecified amounts of the right quantity in this case: something like >95% of all startups fail for various reasons, so even if LW-think could double the standard odds (let’s not dicker around with merely increasing effectiveness by 50% or something, let’s go all the way to +100%!), you’re trying to see the difference between… a 5% success rate and a 10% success rate. One observation just isn’t going to count for much here.
And that is correct. But you don’t have to make a single prediction, success/fail, you should be able to come up with predictions about your company that you can put higher numbers on and we can see how those empirically turn out. Or you could even keep track of all the startups launched by prominent LW members.
In contrast, Michael Vassar (who was also on the project) says,
Definitely, though others must decide the update size.
Which I don’t think anyone followed through on, perhaps because they then agreed with gwern?
Anyway—it seems plausible the correct update size for a founder running a failed startup is a couple percentage points of confidence in them along certain metrics.
I think MetaMed seems like more of an update than that, my basic reasoning being: 1) I think it was entirely possible to see what was wrong with the idea before they kicked it up, 2) accounting for the possibility of bad faith, 3) Constantin’s mortem suggests some maybe serious issues 4) I consider Zvi’s post-mortem to be more deflective than an attempt at real self-evaluation. So maybe like 6-9 points?
I uh, I don’t actually think Balsa is at all likely to be bad or anything. Please don’t let that be your takeaway here. I expect them to write some interesting papers, take a few minutely useful actions, and then pack it in [65%]. There’s no justification why these posts have been as long as they are except that I personally find the topic interesting and want to speak my mind.
I expect I got some things wrong here, feel free to let me know what errors you notice.
I think a lot of your individual arguments are incorrect (e..g. $5000 is a steal for MetaMed’s product if they delivered what they promised. This includes promising only a 10% chance of success, if the problems are big enough).
I nonetheless agree with you that one should update downward on the chance of Balsa’s success due to the gestalt of information that has come out on Zvi and MetaMed (e.g. Zvi saying MetaMed was a definitive test of whether people cared about health or signaling care, while Sarah lays out a bunch of prosaic problems).
I think “we” is a bad framing as long as the project isn’t asking for small donor funding.
I do think grand vague plans with insufficient specifics (aka “goals”) are overrewarded on LW.
OTOH I have a (less) grand vague project that I’m referring to in other posts but not laying out in totality in its own post, specifically because of this, and I think that might be leaving value on the table in the form of lost feedback and potential collaborators. A way for me to lay out grand vague plans as “here’s what I’m working on”, but without making status claims that would need to be debunked, would be very useful.
OTTH it’s maybe fine or even good if I have to produce three object-level blog posts before I can lay out the grand vague goal.
But also it’s bad to discourage grand goals just because they haven’t reached the plan stage yet.
Yes, there are lesser goals that I could hit with 90% probability. Note that in that comment, I was saying that 2% would make the project attractive, rather than saying I put our chances of success at 2%. And also that the bar there was set very high—getting a clear attributable major policy win. Which then got someone willing to take the YES side at 5% (Ross).
That is a fair point! I don’t think Zvi et. al are obligated and I’m not like, going to call them fraudster hacks if they’re not interested.
I said this more with the hopes that people frustrated with unaccountable governance would want to seize the mantle of personal responsibility, to show everyone that they are pure and incorruptible and it can be done. My post came across as more of a demand than I meant it to, which I apologize for.
Organizations can distribute their money how they want. My concern here is more “can pillars of the rat community get funding for crappy ideas on the basis of being pillars and having a buddy in grantmaking?” I want to judge EA orgs on their merits and I want to judge Zvi on his merits. If Balsa flops, who do we give less money?
Zvi said on his substack that he would consider this a worthwhile venture if there were a 2% chance of achieving a major federal policy goal. Are there lesser goals that Zvi thinks they can hit at 50% or 90%? If not, then okay. Sometimes that is just how it is and you have to do the low probability, high EV thing. But even if it’s just the 2% thing, I would like Brier scores to update.
So the other concern is track record legibility. There is a lot of deferral among rats, some of it even necessary. Not every person can be a machine learning person. I’ve been reading LW for eight years and plenty of what Vance and Zvi write, but only heard of MetaMed a few months ago looking at Vance’s LinkedIn.
Searching it up on the forums got very thin results. EY endorsed it strongly (which I believe counts as a ding on his track record if anyone is maintaining that anywhere), Alexander advertised it but remained neutral as to whether it was a good idea. So this was a big thing that the community was excited about—and it turned to shit. I believe it turned to shit without enough discussion in the aftermath of why, of what premises people had wrong. I have read the post mortems and found them lacking.
“Can you run a business well?” doesn’t say much about someone’s epistemics, but “Can you identify the best interventions with which to make use of your time?” absolutely does and “Can you win?” absolutely does and the way to see that is how the project empirically performs. This is a fallible test: you can do a good job at the identification and just suck at the business or just be unlucky, but I’m still going to update towards someone being untrustworthy or incompetent based on it.
Other good reasons not to do this: It is extremely plausible that making all your goals legible is inhibitive to policy work. A solution to that might be timed cryptography or an independent keeping track of their goals and reporting the results of the predictions sans what they were predicting. I am aware that this is a non-trivial inconvenience and would respect the founders considerably more if they went for it.
I am also keenly aware that this is a demand for rigor more isolated than the ice caps. I recognize the failure mode where you demand everyone wears their marks on their sleeve, but in practice only the black ones seem to stick over time. I think that’s really bad because then you end up cycling veterans out and replacing them with new people who are no better or worse. Hopefully we can manage to not end up there.
I think I am much more ambivalent than I sounded in my first post, but I wanted to discuss this. Hopefully it doesn’t cause anyone undue stress.
I don’t think it’s a ding on his track record.
He tried a product.
It worked shockingly well for him.
He recommended others use that product.
This is a basic prosocial act. You haven’t made an argument that the product was low-quality, the failure of the company only shows that there wasn’t enough of a market for that particular product to sustain the company. For the most part I’m glad Eliezer advertised it while I could still buy it, it seems like the product was pretty great (though very expensive).
For context, here is an advert he gave for them, which has no endorsement of funding the company, nor says the organization is well run, and entirely focuses on his experience of the product (with the exception of one parenthetical).
Note: I didn’t read the HPMOR advert, I read the one here on LW which is different. It starts like this:
You’re right that he doesn’t make any specific verifiable claims so much as be very glowing and excited. It does still make me less inclined to trust his predictive ability (or trust him, depending on how much is him believing in that stuff vs building up hype for whatever reason.)
I do think this ad doesn’t line up with what you said re: “[...] nor says the organization is well run, and entirely focuses on his experience of the product (with the exception of one parenthetical).”
As I understand it, you’re updating against his recommendations of a product by his friends being strong evidence that the company won’t later go out of business. This seems fine to me.
I’m saying that his endorsement of the product seems eminently reasonable to me, that it was indeed life-changing for him on a level that very few products ever are, and that in general with that kind of information about a product, I don’t think he made any errors of judgment, and acted pro-socially.
I will continue to take his product advice strongly, but I will not expect that just because a company is run by rationalists or that Eliezer endorses the product, that this is especially strong evidence that they will succeed on the business fundamentals.
I think you were mistaken to call it a “ding on his track record” because he did not endorse investing in the company, he endorsed using the product, and this seems like the right epistemic state to me. From the evidence I have about MetaMed, I would really want to have access to their product.
As an example, if he’d written a post called “Great Investment Opportunity: MetaMed” this would be a ding on his track record. Instead he wrote a post called “MetaMed: Evidence-Based Healthcare”, and this seems accurate and to be a positive sign about his track record of product-recommendations.
Unrelatedly but from the same advert. I had not realized it was that expensive—this rings some alarm bells for me but maybe it is fine, it is in fact a medical service. I have been waffling back and forth and will conclude I don’t know enough of the details.
Regardless, the alarm bells still made me want to survey the comments and see if anyone else was alarmed. Summaries of the comments by top level:
> The words “evidence-based medicine” seems to imply “non evidence-based medicine”
> Will MetaMed make its research freely available?
> Proposals re: the idea that MetaMed might not improve the world save for their clients
> You should disclose that MIRI shares sponsors with MetaMed, detail question
> Please send this to the front page!
> I’m overall not impressed, here are a couple criticisms, what does MetaMed have over uptodate.com in terms of comparative advantage? (Nice going user EHeller, have some Bayes points.)
> Discussion of doctors and their understanding of probability
> MetaMed has gone out of business (3 years later)
> Is MetaMed a continuation of a vanished company called Personalized Medicine?
> A friend of mine has terrible fibromyalgia and would pay 5k for relief but not for a literature search of unknown benefit. I guess she’s not the target audience? (long thread, MetaMed research is cited, EHeller again disputes its value compared to less expensive sources)
> An aside on rat poison
> How might MetaMed and IBM Watson compare and contrast?
> Error in advert: Jaan Tallinn is not the CEO but chairman, Zvi is the CEO.
> Is MetaMed LW-y enough that we should precommit to updating by prespecified amounts on the effectiveness of LW rationality in response to its successes and failures?
There I will cut off because the last commentor is after my own heart. Gwern responds by saying:
And that is correct. But you don’t have to make a single prediction, success/fail, you should be able to come up with predictions about your company that you can put higher numbers on and we can see how those empirically turn out. Or you could even keep track of all the startups launched by prominent LW members.
In contrast, Michael Vassar (who was also on the project) says,
Which I don’t think anyone followed through on, perhaps because they then agreed with gwern?
Anyway—it seems plausible the correct update size for a founder running a failed startup is a couple percentage points of confidence in them along certain metrics.
I think MetaMed seems like more of an update than that, my basic reasoning being: 1) I think it was entirely possible to see what was wrong with the idea before they kicked it up, 2) accounting for the possibility of bad faith, 3) Constantin’s mortem suggests some maybe serious issues 4) I consider Zvi’s post-mortem to be more deflective than an attempt at real self-evaluation. So maybe like 6-9 points?
I uh, I don’t actually think Balsa is at all likely to be bad or anything. Please don’t let that be your takeaway here. I expect them to write some interesting papers, take a few minutely useful actions, and then pack it in [65%]. There’s no justification why these posts have been as long as they are except that I personally find the topic interesting and want to speak my mind.
I expect I got some things wrong here, feel free to let me know what errors you notice.
I’m torn because:
I think a lot of your individual arguments are incorrect (e..g. $5000 is a steal for MetaMed’s product if they delivered what they promised. This includes promising only a 10% chance of success, if the problems are big enough).
I nonetheless agree with you that one should update downward on the chance of Balsa’s success due to the gestalt of information that has come out on Zvi and MetaMed (e.g. Zvi saying MetaMed was a definitive test of whether people cared about health or signaling care, while Sarah lays out a bunch of prosaic problems).
I think “we” is a bad framing as long as the project isn’t asking for small donor funding.
I do think grand vague plans with insufficient specifics (aka “goals”) are overrewarded on LW.
OTOH I have a (less) grand vague project that I’m referring to in other posts but not laying out in totality in its own post, specifically because of this, and I think that might be leaving value on the table in the form of lost feedback and potential collaborators. A way for me to lay out grand vague plans as “here’s what I’m working on”, but without making status claims that would need to be debunked, would be very useful.
OTTH it’s maybe fine or even good if I have to produce three object-level blog posts before I can lay out the grand vague goal.
But also it’s bad to discourage grand goals just because they haven’t reached the plan stage yet.
Yes, there are lesser goals that I could hit with 90% probability. Note that in that comment, I was saying that 2% would make the project attractive, rather than saying I put our chances of success at 2%. And also that the bar there was set very high—getting a clear attributable major policy win. Which then got someone willing to take the YES side at 5% (Ross).