I’m interested in talking to people knowledgeable in decision theory/bayesian statistics about a startup that aims to disrupt the $240,000,000,000 management consulting market. it’s based on the idea of prediction polls, but done on the blockchain(the same thing bitcoin uses) in a completely decentralized way.
I’m particularly interested in people who can help me out with understanding/choosing alternative scoring rules besides Brier scoring.
I can’t pay you for your time, but I can virtually order you a pizza or buy you a beer :).
edit: Here’s the (still very rough) elevator pitch:
For a long time companies relied on a pretty fuzzy metric: People who seemed to be better at making good decisions got to make them. This worked out decently well, but led to one undesirable result: People who were good at making excuses about their decisions ALSO got to make decisions.
The thing was, we didn’t really have a better way to do it. That is, until the data revolution. Suddenly, companies had access to tons of data that they could use to ACTUALLY make better decisions. The problem was, they weren’t politically set up to make use of this data, because all the people in power were those who could make good excuses.
This is were management consulting companies came in. For really big decisions, the management consulting companies would come in as outsiders, charge a bunch of money, and use their clout to use the data to make big decisions (like how many people to fire). This industry rapidly grew to the 240 billion dollar industry it is today.
But there’s a huge problem with the industry—there’s no objective way to tell which companies are actually good at making decisions. This leads to a case where the only way to tell which companies are good is their name and reputation—which means a monopolistic signalling market where the very few who got in early and made a name for themselves get to overcharge for their name, and new cheaper players find it very hard to enter the market.
The solution: An objective metric(bayesian scoring rule) that shows how good an organization or individual is at predicting the future. The entire history of how the company got this score is available on the blockchain, so you avoid the signaling problem by making everything auditable and therefore not having to put your trust in any one brand or company.
Not only can this allow us to take over all the big problems that management consulting currently handles, but it opens up a whole class of smaller decisions that were simply cost prohibitive in the management consulting model, and creates a new paradigm for management as a result.
Edit 2: If you’re effectively altruist minded, it may be of interest to know that the reason I’m interested in doing this is to drastically reduce the cost of impact assessments.
You seem to assume that the management consulting companies are paid for making the correct decision based on the data… as opposed to giving the answer someone important in the management (the person who made the decision to hire them) wanted to hear, while providing this person plausible deniability (“it wasn’t my idea; it’s what the world-renown experts told us to do; are you going to doubt them?”).
Depending on which view is correct, there may or may not be a market demand for your solution.
I’ve seen this cynical viewpoint before. Honest question—what do you know about management consulting? What specific management consulting decisions are you basing this theory off of and how common are they? And how much of consulting consists of much more boring activities like developing new supply chains and inventory systems, rather than Machiavellian strategizing?
I have no direct experience with management consulting.
My opinions are formed by: my own observations of office politics; reading Dilbert; reading Robin Hanson; listening to stories of my friend who is an IT consultant. But I trust the other sources because they are compatible with what I observe.
Maybe it depends on a company, and maybe the one where I work now is an unually dysfunctional one (or maybe I just have better information channels and pay better attention), but most management decisions are completely idiotic. What the managers are good at optimizing for, is keeping their jobs. Even that is not done by making sure the projects succeed, but rather by destroying internal competitors.
For example, one of our managers was fired because our IT support department was actively sabotaging our project for a few months and we had no budget to seek help elsewhere; so we missed a few deadlines because we even had no servers functioning, and then the guy was fired for incompetence. The new manager is a good friend with the IT support manager, so when he got his role, our IT support department stopped actively sabotating us. This was all he ever did for us; otherwise he almost completely ignores the project. He is praised as a competent leader, because now we succeeded to catch up with the schedule. That was mostly because of the hard work of our three most competent developers. One of them was recently fired, because we had too many people on the team. And that’s because the new manager also brought a few developers from his old team; they do absolutely nothing, officially because they are experts on a different programming language, but we secretly suspect they actually don’t even know programming, so they don’t contribute and mostly don’t even go to work, but now they are part of our budget, so someone else had to go. Why not pick randomly?
How is it possible that such systems survive? My explanation is that nerds are really bad at playing power games (actually so bad that they don’t even realize that such games exist or need to be played; they may even object violently, which makes them really bad allies in such games, which is why no one will even try to ally with them and educate them). Instead our weakness is the eternal childish desire to be praised by a parent figure for being smart. So whenever shit hits the fan, the developers will work extra hard to fix the problem—without even thinking about using that as a leverage to gain more power in the organization. Most nerds are too shy to ask for a pay raise, even if they have just saved the management’s collective asses. So the managers can afford to ignore the technical aspects completely as something that happens automatically at a constant cost, and can focus fully on their own internal fights.
A few months ago I was ‘jokingly’ trying to get my colleagues to expore the idea of what could happen if the developers decided to form a union. How the management would be completely at our mercy, because they don’t understand anything, are unable to hire a replacement quickly (it took them forever to find and hire us), and with the tight schedule any waste of time would totally sink the project. Even if we would use this power for goals compatible with the company goals, we could negotiate to remove a lot of inefficiency and improve our working conditions. But we could also all ask for a raise, and for the company this whole revolution would still be profitable. -- My colleagues listen to me, mostly agreed with some conclusions on an abstract level, and then laughed because it was obviously such a silly idea. They all live in the imaginary universe where your income and quality of life is directly proportional to your coding skills and nothing else matters. I was screaming internally, but I politely laughed with them. Now some of them are being fired, regardless of their competence and hard work, and more will follow. Har har. I don’t worry about them too much; it will be easy to find another job. But it will be more or less the same thing, over and over again. They had an opportunity for something better and they ignored it completely. Worst case, if the plan would backfire, they would be in the same situation they are now.
As Plato said, one of the penalties for refusing to participate in politics is that you end up being governed by your inferiors. That’s IT business in a nutshell.
Maybe it depends on a company, and maybe the one where I work now is an unually dysfunctional one (or maybe I just have better information channels and pay better attention), but most management decisions are completely idiotic
It is also possible that you aren’t aware of most of what your management does. I’ll take your word for it that many of their decisions that are visible to you are poor (maybe most of their decision are, but I’m not yet convinced). As for management consulting, I suppose that is an inferential gap that is going to be hard to bridge.
The implication of my story for management consulting is: if this company (assuming that I have described it correctly) would ever hire a management consulting company, why would they decide to do it, how would they choose the specific company, what task would they give to the company, and how would they use the results?
My model says that they wouldn’t hire the management consulting company unless as a move in some internal power struggle; the choice would most likely be done on basis of “some important person’s friend works for the consulting company or recommended the company”; they would give the company a completely false description of our organization and would choose the most uninformed and incompetent people as speakers (for example, they might choose one of those ‘programmers’ who doesn’t contribute to our project as the person who will describe the project to the consultants); and whatever reports the consulting company would give to us, our management would completely reinterpret them to fit their existing beliefs.
In other words, I have no direct information about the management consulting companies, but I have a model of their customers; and that models says that in the market for management consulting the actual quality of the advice is irrelevant. (Unless companies like this are a minority on the market.)
It is also possible that you aren’t aware of most of what your management does.
The upper echelons don’t invite me to their meetings, so there is always a chance. But when I tried to socialize with some of the lower managers, the story is usually that the higher managers mostly sabotage their work by “hit-and-run management”. It works like this: the higher manager knows nothing about the project and most of the time doesn’t even care. Suddenly they become interested in some detail (e.g. something got wrong and the customer complained to them, or they just randomly heard something and decided to “be useful”). So they come and start micromanaging to optimize for that detail, completely ignoring all the context. Sometimes they contribute to improve the detail, sometimes the detail would be fixed in exactly the same time even without their contribution; but in the process they usually do harm to all other things.
For example, imagine that there are five known minor bugs in the program, and there are five programmers working on them. Under usual circumstances, all five bugs would be solved in a day, one bug per programmer. But the customer complained about one of the bugs on the phone, so the big boss comes and makes everyone work on that one bug. So at the end of the day, only one of the five bugs is fixed, and the big boss leaves, feeling victorious. (From his point of view the story probably reads: “Without my oversight, this department produced a software with an error that bothered the customer, but thanks to my heroic action, the whole problem was solved in a single day. Yay for being agenty!”) Meanwhile, the things that would allow us detect and fix the bugs more reliably before they even get to the customer, such as automated testing or even using written test scenarios, are ignored for years (this is not an exaggeration), no matter how often the programmers complain about that on meetings.
Another issue is that the managers never cross-check the information they get. That makes “being the first one who has an opportunity to tell managers their version of the story” critical. For example, we need some work done from the IT support department. The support does step 1 of 10, then reports to managers “it’s done” and stops working on the issue. The managers are happy, the programmers keep waiting… a few months later the topic gets mentioned at the meeting, the manager is like: “so, you guys were happy to have this problem solved so quickly, right?”, the programmers are like: “wtf, we keep waiting for months”, the manager is: “wtf, the problem was already solved months ago”, the programmers: “no way!”, the manager: “okay, let’s call the support on the phone”, the support: “sure, the problem was solved months ago… oh yeah, yeah… well, it wasn’t solved completely, there are still a few details missing (such as steps 2 to 10), but the programmers never complained about that so we thought that way okay”, the manager: “guys, seriously, why don’t you communicate more clearly”, the programmers: “well, the steps 1 to 10 were clearly described in the specification we had to write for the support department”, the support: “well, we were not sure you really needed that”… And the next time again we need something, the support again mostly ignores it and reports the work as done, and no manager bothers to verify. Similarly when we get incomplete specifications, etc. Many people in the company use this opportunity to not do their work, report it as done, and later use some half-assed excuse. Only the programmers have to write the code, otherwise the customer would complain. Anyone else only generates internal complaints, which are not taken seriously by the management.
Somewhat related to this: imagine that you would manage a project where three employees, A, B, C, each have to do one aspect of the project, and the next one can start their job only after the previous one has finished. For example: specification, programming, testing. And you would have 10 days to deliver the results to the customer. Well, I would certainly create internal deadlines, e.g. A must deliver their part on day 3, B must deliver their part on day 6, C must deliver their part on day 9, and there is one day reserve. If on the day 4 the employee A tells me he is not ready and will probably not complete it even today, I would treat that as an impending crisis; because every delay caused by A means less time for B and C. -- Instead, our managers simply write into their private calendars that on day 10 we need to deliver the product to the customer, and they feel their work is done. The person A usually takes 8 days to do their part, and even then they often give an incomplete work to the B, who will work like crazy the remaining 2 days, and the part of C is usually skipped (C is testing, this is why we then have so many bugs reported by the customer). The managers start being active on day 10, usually when they return from lunch, and start reminding everyone that today is the critical day we need to deliver the product to the customer. The employee A has their work already finished, so there is no pressure on them; all pressure goes to B and C. And this keeps happening again and again, every few weeks, for years. If you try to speak with the managers about it, they tell you “yes, we are aware of the problem, and we are working on solving it”. Just like they told you a year ago.
Another failure mode typical for our company is the following: there is some work X that needs to be done, but none of our employees is an expert on X, and we can’t solve the problem using google (also we have other work to do). We keep reminding the management that we would need some expert on X; either a new employee, or at least an external consultant that would spend a day or two working with us. There are two ways this can end. Option 1 -- a year or two later the management finally tells us they will invite the external expert, but only for an hour or two, because the expert is very expensive. We keep waiting. A week or two later we are told that the expert already was here. “Really? Who did he talk with?” No one knows, but after another week we find out it was someone irrelevant who knows nothing about our project or about X. “So what did he ask the expert?” Most likely, it was something different that either doesn’t apply to our project, or is so simple that we could have answered that ourselves. “So what did the expert answer?” Sorry, we forgot. Nope, no one took notes. Then the management says: “Okay guys, we already did what you wanted, now please stop making excuses and finally do the work we were supposed to deliver to the customer a year ago.” Option 2 -- a year or two later an expert on X is hired. Everyone in the team celebrates. However, the next day the person is given a task to work on some completely unrelated Y. Why? For some reason management suddenly believes it is the highest priority of the day, although it is something we could have solved without the new guy. So the new guy works on Y and doesn’t have time for X. Then the new guy is told to work on Z, et cetera. A few months later the new guy is annoyed and quits, because he wants to specialize on X, but he was given no time to do that here; so our problems remain unsolved. Then the management says: “Okay guys, we had an expert on X here, now please stop making excuses and complete the work.”
Eh, I could go on like this for days. The point is, I don’t believe there is some higher wisdom there. Other than the fact that we get government projects because of political connections, so the actual quality of our product is irrelevant as long as it works and is completed more or less on time; and even that is often a problem.
Depending on which view is correct, there may or may not be a market demand for your solution.
This question is something that keeps me up at night.
In the long term, I’m confident that if the latter case is true, my solution will (eventually) outcompete anyone using mangement consultants. Because of the blockchain based business model, this is a possibility that the company (in the loosest sense of the word) can handle. This would be worst case scenario.
“The market can stay irrational longer than you can stay solvent.”
That’s not how the blockchain works—once the app is there, it exists forever( at least as long as other apps are using that same blockchain), and it can limp along as long as it needs to until the market catches up. It’s one of the key reasons I chose the business model I did (which allows investors to make money from the app being succesful, no matter whether that’s from an application of the protocol IM using, or someone else)
as opposed to giving the answer someone important in the management (the person who made the decision to hire them) wanted to hear, while providing this person plausible deniability (“it wasn’t my idea; it’s what the world-renown experts told us to do; are you going to doubt them?”)
That’s a predominately satirical (sometimes conspiracy theory..). I hope you’re using it that way and aren’t just ignorant...
You’re selling the efficacy of your implementation to firms for making correct decisions. The perception of correct decisions, or potential thereof is important.
Unless you also consider Dilbert to be a conspiracy theorist...
People are often optimizing for their own goals, instead of the goals of the organization they are working for. People are stupid. People are running on a corrupted hardware. Put these three facts together, and you will see organizations where managers sometimes make genuinely bad decisions, and sometimes they make decisions that help them but harm the company; and they will of course deny doing this, and sometimes they are simply lying, but sometimes they honestly believe it.
Often the quality of a decision is hard to measure, or perhaps just too easy to rationalize either way. When a manager makes a bad decision, it does not mean that the project will fail. Sometimes the employees will work harder or take overtime to fix the problems. Sometimes the company is lucky because their customer is even more dysfunctional than them, so they won’t notice how faulty and overpriced is the delivered product. When a manager makes good decisions, it does not mean that the project will succeed. Sometimes other managers that are supposed to cooperate with the department will sabotage the project to get rid of an internal competitor. Sometimes there are unpredictable forces outside, for example a new competing product appears on the market, and it happens to be better and cheaper, or just has better marketing. -- And of course, when a project succeeds, the typical manager will attribute it to their own smart decisions, and when a project fails, there will always be someone or something else to blame. It’s like in politics.
I know about scoring rules and probability assessments. Email me and we’ll set up a time to talk.
Similar to Viliam in a sibling comment, I think that this is the sort of idea that would work in the ideal world but not the real world. To channel Hanson, “Consulting is not about advice,” and thus a product that seeks to disrupt consulting by providing superior advice will simply fail. (Compare to MetaMed, which tried to disrupt medicine by providing superior diagnostics. Medicine is not about healing!)
To channel Hanson, “Consulting is not about advice,” and thus a product that seeks to disrupt consulting by providing superior advice will simply fail.
This is something that I would be interested in reading, so I think I found the link in case anyone else is interested.
Side story, I once did a case study phone interview with a consulting firm, using a real world example of one of their clients, a major credit card company. They were tasked with finding ways to increase revenue. Without any background information I gave them a bunch of wacky out of the box and on the spot answers. I asked them what the real answer was.
The answer? Need more customers. That’s 500k over the course of a month for 6 MBAs. But the client gets a 30 page PDF with words on it documenting their finds so shrug
Analytical skills were overrated, for the simple reason that clients usually didn’t know why they had hired us. They sent us vague requests for proposal, we returned vague case proposals, and by the time we were hired, no one was the wiser as to why exactly we were there. I got the feeling that our clients were simply trying to mimic successful businesses, and that as consultants, our earnings came from having the luck of being included in an elaborate cargo-cult ritual. … Most of my day was spent thinking up and writing PowerPoint slides.
In one case, the question I was tasked with solving had a clear and unambiguous answer: By my estimate, the client’s plan of action had a net present discounted value of negative one billion dollars. … But the client did not want analysis that contradicted their own, and my manager told me plainly that it was not our place to question what the client wanted.
The puzzle is why firms pay huge sums to big name consulting firms, when their advice comes from kids fresh out of college, who spend only a few months studying an industry they previous knew nothing about. How could such quick-made advice from ignorant recent grads be worth millions? Why don’t firms just ask their own internal recent college grads?
My guess is that most intellectuals underestimate just how dysfunctional most firms are. Firms often have big obvious misallocations of resources, where lots of folks in the firm know about the problems and workable solutions. The main issue is that many highest status folks in the firm resist such changes, as they correctly see that their status will be lowered if they embrace such solutions. The CEO often understands what needs to be done, but does not have the resources to fight this blocking coalition. But if a prestigious outside consulting firm weighs in, that can turn the status tide.
Yes the information contained in consulting advice can be obtained elsewhere at a lower cost. Firms could hire most any smart independent folks, or set up a prediction market. But alas those sources don’t have the raw strength of status to cow opponents into submission, opponents who in practice can block changes no matter what a CEO declares.
Fellow consultants and associates [said] fifty percent of the job is nodding your head at whatever’s being said, thirty percent of it is just sort of looking good, and the other twenty percent is raising an objection but then if you meet resistance, then dropping it.
So there are really two types of consulting. There’s operational consulting, you know, down on the factory floor, in the shop type improvements. That’s probably ninety-five percent of the industry. Most of it is done by firms you’ve never heard of. … And then there’s the very small elite end, strategy consulting, about five percent. And that’s much more helping CEOs make big decisions.
“Consulting is not about advice,” and thus a product that seeks to disrupt consulting by providing superior advice will simply fail.
I think this is the most legitimate objection to the entire model as stated. To be fair, the reason I want this out there is NOT just to disrupt consulting—my investors asked me for the best mass market play, and this was it. I have other uses for a cheap accurate forecasting tool that don’t involve Fortune 500 companies :).
edit: I do think that no matter what happens, a tool like this will eventually come to dominate because it’ s just better—but it may take new companies out-competing other companies using the tool, which is much slower than convincing existing companies to switch from management consulting.
Compare to MetaMed, which tried to disrupt medicine by providing superior diagnostics. Medicine is not about healing!
I’d love to hear this expanded on. On the surface this comment pattern matches to the sort of low quality anti-establishment attitude that is common around here, so I’m surprised to see you write it.
Three main sources. (But first the disclaimer About Isn’t About You seems relevant—that is, even if medicine is all a sham (which I don’t believe), participating in the medical system isn’t necessarily a black mark on you personally.)
First is Robin Hanson’s summary on the literature on health economics. The medicine tag on Robin’s blog has a lot, but a good place to start is probably Cut Medicine in Half and Medicine as Scandal followed by Farm and Pet Medicine and Dog vs. Cat Medicine. To summarize it shortly, it looks like medical spending is driven by demand effects (we care so we spend to show we care) rather than supply effects (medicine is better so we consume more) or efficacy (we don’t keep good records of how effective various doctors are). His proposal for how to fund medicine shows what he thinks a more sane system would look like. (As ‘cut medicine in half’ suggests, he doesn’t think the average medical spending has a non-positive effect, but that the marginal medical spending does, to a very deep degree.)
Second is the efficiency literature on medicine. This is statisticians and efficiency experts and so on trying to apply standard industrial techniques to medicine and getting pushback that looks ludicrous to me. For example, human diagnosticians perform at the level or worse than simple algorithms (I’m talking linear regressions, here, not even neural networks or decision trees or so on), and this has been known in the efficiency literature for well over fifty years. Only in rare cases does this actually get implemented in practice (for example, a flowchart for dealing with heart attacks in emergency rooms was popularized a few years back and seems to have had widespread acceptance). It’s kind of horrifying to realize that our society is smarter about, say, streamlining the production of cars than we are streamlining the production of health, especially given the truly horrifying scale of medical errors. Stories like Semmelweis and the difficulty getting doctors to wash their hands between patients further expand this view.
Third is from ‘the other side’; my father was a pastor and thus spent quite some time with dying people and their families. His experience, which is echoed by Yvain in Who By Very Slow Decay and seems to be the common opinion among end-of-life professionals in general, is that the person receiving end-of-life care generally doesn’t want it and would rather die in peace, and the people around them insist that they get it (mostly so that they don’t seem heartless). As Yvain puts it:
Robin Hanson sometimes writes about how health care is a form of signaling, trying to spend money to show you care about someone else. I think he’s wrong in the general case – most people pay their own health insurance – but I think he’s spot on in the case of families caring for their elderly relatives. The hospital lawyer mentioned during orientation that it never fails that the family members who live in the area and have spent lots of time with their mother/father/grandparent over the past few years are willing to let them go, but someone from 2000 miles away flies in at the last second and makes ostentatious demands that EVERYTHING POSSIBLE must be done for the patient.
Once you really grok that a huge amount of medical spending is useless torture, and if you are familiar with what it looks like to design a system to achieve an end, it becomes impossible to see the point of our medical system as healing people.
I broadly differ with the hansonian take on medicine. I think metamed failed not because it offered more effective healing but went bust because medicine doesn’t really demand healing; but rather that medicine is about healing, generally does this pretty well, and Metamed was unable to provide a significant edge in performance over standard medicine. (I should note I am a doctor, albeit a somewhat contrarian one. I wrote the 80k careers guide on medicine).
I think medicine is generally less fertile ground for hansonian signalling accounts, principally because health is so important for our life and happiness we’re less willing to sacrifice it to preserve face (I’d wager it is an even better tax on bs than money). If the efficacy of marginal health spending is near zero in rich countries, that seems evidence in support of, ‘medicine is really about healing’ - we want to live healthily so much we chase the returns curve all the way to zero!
There are all manner of ways in which western world medicine does badly, but I think sometimes the faults are overblown, and the remainder are best explained by human failings rather than medicine being a sham practice:
1) My understanding of the algorithms for diagnosis is that although linear regressions and simple methods can beat humans at very precise diagnostic questions (e.g. ’Given these factors of a patient who is mentally ill, what is their likelihood of committing suicide?), humans still have better performance in messier (and more realistic) situations. It’d be surprising for IBM to unleash Watson on a very particular aspect of medicine (therapeutic choice in oncology) if simple methods could beat doctors across most of the board.
(I’d be very interested to see primary sources if my conviction is mistaken)
2) Medicine has become steadily more and more protocolized, and clinical decision rules, standard operating procedures and standards of care are proliferating rapidly. I agree this should have happened sooner: that Atul Gwande’s surgical checklist happened within living memory is amazing, but it is catching on, and (mildly against hansonian explanations) has been propelled by better outcomes.
I can’t speak for the US, but there are clear protocols in the UK about initial emergency management of heart attacks. Indeed, take a gander at the UK’s ‘NICE Pathways’ which gives a flow chart on how to act in all circumstances where a heart attack is suspected.
3) I agree that the lack of efficacy information about individual doctors isn’t great. Reliable data on this is far from trivial to acquire however, and that with doctors understandable self-interest not to be too closely monitored seems to explain this lacuna as well as the hansonian story. (Patients tend to want to know this information if it is available, which doesn’t fit well with them colluding with their doctors and family in a medical ritual unconnected to their survival).
4) Over-treatment is rife, but the US is generally held up as an anti-examplar of this fault, and (at least judging by the anecdotes) medics in the UK are better (albeit still far from perfect) at flogging the patient to death with medical torture. Outside of this zero or negative margin, performance is better: it is unclear how much is attributable to medicine, but life expectancy, disease free life expectancy, and age-standardized mortality rates for most conditions are declining.
Now, why Metamed failed (I appreciate one should get basically no credit for predicting a start up will fail given this is the usual outcome, but I called it a long time ago):
Metamed’s business model relied on there being a lot of low hanging fruit to pluck. That in many cases, a diagnosis or treatment would elude the clinician because they weren’t appraised of the most recent evidence, were only able to deal in generalities rather than personalized recommendations, or that they just were less adept at synthesizing the evidence available.
If it were Metamed versus the average doctor—the one who spends next-to-no time reading academic papers, who is incredibly busy, stressed out, and so on, you’d be forgiven for thinking that metamed has an edge. However, medics (especially generalists) have long realized they have no hope of keeping abreast of a large medical literature on their own. Enter division of labour: they instead commission the relevant experts to survey, aggregate and summarize the current state of the evidence base, leaving them the simpler task of applying in their practice. To make sure it was up to date, they’d commission the experts to repeat this fairly often.
I mentioned NICE (National Institute of Clinical Excellence) earlier. They’re a body in the UK who are responsible (inter alia) for deciding when drugs and treatments get funded on the NHS. They spend a vast amount of time on evidence synthesis and meta-analysis. To see what sort of work this produces google ‘NICE {condition}’. An example for depression is here. Although I think the UK is world leading in this aspect, there are similar bodies in similar countries in other countries, as well as commercial organizations (e.g. Uptodate.)
Against this, Metamed never had any edge: they didn’t have groups of subject matter experts to call upon for each condition or treatment in question, nor (despite a lot of mathsy inclination amongst them) did they by and large have parity in terms of meta-analysis, evidence synthesis and related skills. They were also outmatched in terms of quantity of man hours that could be deployed, and the great headstart NICE et al. already had. When their website was still up I looked at some of their example reports, and my view was they were significantly inferior to what you could get via NICE (for free!) or Uptodate or similar services for their lower fees.
MEtamed might have had a hope if in the course of producing these general evidence summaries, a lot of fine-grained data was being aggregated out to produce something ‘one size fits all’ - their edge would be going back to the original data to find out that although generally drug X is good for a condition, in ones particular case in virtue of age, genotype, or whatever else, drug Y is superior.
However, this data by and large does not exist: much of medicine is still at the stage of working out whether something works generally, rather than delving into differential response and efficacy. It is not clear it ever will—humans might be sufficiently similar to one another that for almost all of them one treatment will be the best. The general success of increasing protocolization in medicine is some further weak evidence of this point.
I generally adduce meta-med as an example of rationalist overconfidence. That insurgent Bayesians can just trounce relevant professionals in terms of what they purport to do thanks to signalling etc. But again, given the expectation was for it to fail (as most start ups do), this doesn’t provide evidence. If it had succeeded, I’d have updated much more strongly in the magic of rationalism meaning you can win and the world being generally dysfunctional.
principally because health is so important for our life and happiness we’re less willing to sacrifice it to preserve face (I’d wager it is an even better tax on bs than money).
I agree that I expect people to be more willing to trade money for face than health for face. I think the system is slanted too heavily towards face, though.
I should also point out that this is mostly a demand side problem. If it were only a supply side problem, MetaMed could have won, but it’s not—people are interested in face more than they’re interested in health (see the example of the outdated brochure that was missing the key medical information, but looked like how a medical brochure is supposed to look).
It’d be surprising for IBM to unleash Watson on a very particular aspect of medicine (therapeutic choice in oncology) if simple methods could beat doctors across most of the board.
My understanding is that this is correct for the simple techniques, but incorrect for the complicated techniques. That is, you’re right that a single linear regression can’t replace a GP but a NLP engine plus a twenty questions bot plus a causal network probably could. (I unfortunately don’t have any primary sources at hand; medical diagnostics is an interest but most of the academic citations I know are all machine diagnostics, since that’s what my research was in.)
I should also mention that, from the ML side, the technical innovation of Watson is in the NLP engine. That is, a patient could type English into a keyboard and Watson would mostly understand what they’re saying, instead of needing a nurse or doctor to translate the English into the format needed by the diagnostic tool. The main challenge with uptake of the simple techniques historically was that they only did the final computation, but most of the work in diagnostics is collecting the information from the patient. And so if the physcian is 78% accurate and the linear regression is 80% accurate, is it really worth running the numbers for those extra 2%?
From a business standpoint, I think it’s obvious why IBM is moving slowly; just like with self-driving cars, the hard problems are primarily legal and social, not technical. Even if Watson has half the error rate of a normal doctor, the legal liability status is very different, just like a self-driving car that has half the error rate of a human driver would result in more lawsuits for the manufacturer, not less. As well, if the end goal is to replace doctors, the right way to do that is imperceptibly hand more and more work over to the machines, not to jump out of the gate with a “screw you, humans!”
I agree this should have happened sooner: that Atul Gwande’s surgical checklist happened within living memory is amazing, but it is catching on, and (mildly against hansonian explanations) has been propelled by better outcomes.
So, just like the Hansonian view of Effective Altruism is that it replaces Pretending to Try not with Actually Trying but with Pretending to Actually Try, if there is sufficient pressure to pretend to care about outcomes then we should expect people to move towards better outcomes as their pretending has nonzero effort.
But I think you can look at the historical spread of anesthesia vs. the historical spread of antiseptics to get a sense of the relative importance of physician convenience and patient outcomes. (This is, I think, a point brought up by Gawande.)
I think I agree with your observations about MetaMed’s competition but not necessarily about your interpretation. That is, MetaMed could have easily failed for both the reasons that its competition was strong and that its customers weren’t willing to pay for its services. I put more weight on the latter because the experience that MetaMed reported was mostly not “X doesn’t want to pay $5k for what they can get for free from NICE” but “X agrees that this is worth $100k to them, but would like to only pay me $5k for it.” (This could easily be a selection effect issue, where everyone who would choose NICE instead is silent about it.)
However, this data by and large does not exist: much of medicine is still at the stage of working out whether something works generally, rather than delving into differential response and efficacy. It is not clear it ever will—humans might be sufficiently similar to one another that for almost all of them one treatment will be the best. The general success of increasing protocolization in medicine is some further weak evidence of this point.
This is why I’m most optimistic about machine medicine, because it basically means instead of going to a doctor (who is tired / stressed / went to medical school twenty years ago and only sort of keeps up) you go to the interactive NICE protocol bot, which asks you questions / looks at your SNPs and tracked weight/heart rate/steps/sleep/etc. data / calls in a nurse or technician to investigate a specific issue, diagnoses the issue and prescribes treatment, then follows up and adjusts its treatment outcome expectations accordingly.
(Sorry for delay, and thanks for the formatting note.)
My knowledge is not very up to date re. machine medicine, but I did get to play with some of the commercially available systems, and I wasn’t hugely impressed. There may be a lot more impressive results yet to be released commercially but (appealing back to my priors) I think I would have heard of it as it would be a gamechanger for global health. Also, if fairly advanced knowledge work of primary care can be done by computer, I’d expect a lot of jobs without the protective features of medicine to be automated.
I agree that machine medicine along the lines you suggest will be superior to human performance, and I anticipate this to be achieved (even if I am right and it hasn’t already happened) fairly soon. I think medicine will survive less by the cognitive skill required, but rather though technical facility and social interactions, where machines comparably lag (of course, I anticipate they will steadily get better at this too).
I grant a hansonian account can accomodate this sort of ‘guided by efficacy’ data I suggest by ‘pretending to actually try’ considerations, but I would suggest this almost becomes an epicycle: any data which supports medicine being about healing can be explained away by the claim that they’re only pretending to be about healing as a circuitous route to signalling. I would say the general ethos of medicine (EBM, profileration of trials) looks like pro tanto reasons in favour about being about healing, and divergence from this (e.g. what happened to semmelweis, other lags) is better explained by doctors being imperfect and selfish, and patients irrational, rather than both parties adeptly following a signalling account.
But I struggle to see what evidence could neatly distinguish between these cases. If you have an idea, I’d be keen to hear it. :)
I agree with the selection worry re. Metamed’s customers: they also are assumedly selected from people who modern medicine didn’t help, which may also have some effects (not to mention making Metameds task harder, as their pool will be harder to treat than unselected-for-failure cases who see the doctor ‘first line’). I’d also (with all respect meant to the staff of Metamed) suggest staff of Metamed may not be the most objective sources of why it failed: I’d guess people would prefer to say their startups failed because of the market or product market fit, rather than ‘actually, our product was straight worse than our competitors’.
But I struggle to see what evidence could neatly distinguish between these cases. If you have an idea, I’d be keen to hear it. :)
I’m not sure there’s much of a difference between the “doctors care about healing, but run into imperfection and seflishness” interpretation and the “doctors optimize for signalling, but that requires some healing as a side effect” interpretation besides which piece goes before the ‘but’ and which piece goes after.
The main difference I do see is that if ‘selfishness’ means ‘status’ then we might see different defection than if ‘selfishness’ means ‘greed.’ I’m not sure there’s enough difference between them for a clear comparison to be made, though. Greedy doctors will push for patients to do costly but unnecessary procedures, but status-seeking doctors will also push for patients to do costly but unnecessary procedures because it makes them seem more important and necessary.
Regarding arguments that the allocation of medical resources, particularly in the U.S. are wasteful and harmful in many cases—I agree in general, though the specifics are messy, and I don’t find Robin’s posts on the matter very well argued*. I’m most interested in this bit:
This is statisticians and efficiency experts and so on trying to apply standard industrial techniques to medicine and getting pushback that looks ludicrous to me. For example, human diagnosticians perform at the level or worse than simple algorithms (I’m talking linear regressions, here, not even neural networks or decision trees or so on), and this has been known in the efficiency literature for well over fifty years
Particularly since your initial claim that had me raising eyebrows was that MetaMed failed because they have great diagnostics, but medicine doesn’t want good diagnostics.
Edit: *In the RAND post he argues that lower co-pays in a well insured population resulted in no marginal benefit of health (I’m unconvinced by this but I’d rather not go there), therefore the fact that most studies show a positive effect of medicine is a sham. I’m not sure if he thinks that statins and insulin are a scam but this is a bold and unjustified conclusion. The RAND experiment is not equipped to evaluate the overall healthcare effects of medicine, and that was not its main purpose—it was for examining healthcare utilization. The specific health effects of common interventions are known by studying them directly, and getting patients to follow the treatment protocols that get those results is, as far as I know, an unsolved problem.
Particularly since your initial claim that had me raising eyebrows was that MetaMed failed because they have great diagnostics, but medicine doesn’t want good diagnostics.
Ah, that’s a slightly broader claim than the one I wanted to make. MetaMed, especially early on, optimized for diagnostics and very little else, and so ran into problems like “why is the report I paid $5,000 for so poorly typeset?”. So it’s not that medicine / patients wants bad diagnostics ceteris paribus, but that the tradeoffs they make between the various features of medical care make it clear that healing isn’t the primary goal.
The RAND experiment is not equipped to evaluate the overall healthcare effects of medicine, and that was not its main purpose—it was for examining healthcare utilization.
As I understand it, the study measured health outcomes at the beginning and end of the study, as well as utilization during the study. The group with lower copays consumed much more medicine than the group with higher copays, but was no healthier. This suggests that the marginal bit of medicine—i.e. the piece that people don’t consume, but would if it were cheaper or do consume but wouldn’t if it were more expensive—doesn’t have a net impact. (Anything that it would do to help is countered by the risks of interacting with the medical system, say.)
I think I should also make it clear that there’s a difference between medicine, the attempt to heal people, and Medicine, the part of our economy devoted to such, just like there’s a distinction between science and Science. One could make a similar claim that Science Isn’t About Discovery, for example, which would seem strange if one is only thinking about “the attempt to gain knowledge” instead of the actual academia-government-industry-journal-conference system. Most of Robin’s work is on medical spending specifically, i.e. medicine as actually practiced instead of how it could be practiced.
“People evaluated this report solely using non-medical considerations” is not the same as “medical considerations aren’t the primary goal” in the way that is normally understood. The non-medical consdierations serve as a filter.
I want to read a book with a good story (let’s call that a good book). However, I don’t want to read a good book that will cost me $5000 to read. By your definition, that means that my primary goal is not to read a good book, my primary goal is to read a cheap enough book.
That is not how most people use the phrase “primary goal”.
The non-medical consdierations serve as a filter. … That is not how most people use the phrase “primary goal”.
Which suggests to me that those are the primary goal. Now, you might say “but most people are homo hypocritus, not homo economicus, so ‘primary goal’ should mean ‘stated goal’ instead of ‘actual goal’. And if that’s your reply, go back and reread all my posts replacing “primary goal” with “actual goal,” because the wording isn’t specific.
I want to read a book with a good story (let’s call that a good book). However, I don’t want to read a good book that will cost me $5000 to read. By your definition, that means that my primary goal is not to read a good book, my primary goal is to read a cheap enough book.
Your primary goal is your life satisfaction, and good books are only one way to achieve that; if you think you can get more out of $5k worth of spending in other areas than on books, this lines up with my model.
(I will note, though, that one can view many college classes as the equivalent of “spending $5000 to read a good book.”)
The relevant comparison is more like this one: suppose you preferred a poorly reviewed book with a superior cover to a well reviewed book with an inferior cover. Then we could sensibly conclude that you care more about the cover than the reviews, even if you verbally agree that reviews are more likely to be indicative of quality than the cover.
And if that’s your reply, go back and reread all my posts replacing “primary goal” with “actual goal,” because the wording isn’t specific.
It doesn’t make any more sense with that. Pretty much nobody would say that because they wouldn’t do Y if X wasn’t true, X is their actual goal for Y. Any term that they would use is such that substituting it in makes your original statement not very insightful.
(For instance, most people wouldn’t call X a primary goal or an actual goal, but they might call X a necessary condition. But if you were to say “people found something other than healing to be a necessary condition for buying a report”, that would not really say much that isn’t already obvious.)
The relevant comparison is more like this one: suppose you preferred a poorly reviewed book with a superior cover to a well reviewed book with an inferior cover.
I prefer a poorly reviewed book that costs $10 to a well reviewed book that costs $5000. By your reasoning I “care more about the price than about the reviews”.
(I will note, though, that one can view many college classes as the equivalent of “spending $5000 to read a good book.”)
Pretty much nobody would say that because they wouldn’t do Y if X wasn’t true, X is their actual goal for Y.
I think this reveals our fundamental disagreement: I am describing people, not repeating people’s self-descriptions, and since I am claiming that people are systematically mistaken about their self-descriptions, of course there should be a disagreement between them!
That is, suppose Alice “goes to restaurants for the food” but won’t go to any restaurants that have poor decor / ambiance, but will go to restaurants that have good ambiance and poor food. If Bob suggests to Alice that they go to a hole-in-the-wall restaurant with great food, and Alice doesn’t like it or doesn’t go, then an outside observer seems correct in saying that Alice’s actual goal is the ambiance.
Now, sure, Alice could be assessing the experience along many dimensions and summing them in some way. But typically there is a dominant feature that overrides other concerns, or the tradeoffs seem to heavily favor one dimension (perhaps there need to be five units of food quality increase to outweigh one unit of ambiance quality decrease), which cashes out to the same thing when there’s a restricted range.
I prefer a poorly reviewed book that costs $10 to a well reviewed book that costs $5000. By your reasoning I “care more about the price than about the reviews”.
I think you do care more about the price than about the reviews? That is, if there were a book that cost $5k and there were a bunch of people who had read it and said that the experience of reading it was life-changingly good and totally worth $5k, and you decided not to spend the money on the book, it’s clear that you’re not in the most hardcore set of story-chasers, but instead you’re a budget-conscious story-chaser.
To bring it back to MetaMed, oftentimes the work that they did was definitely worth the cost. People pay hundreds of thousands of dollars for treatment of serious conditions, and so the idea of paying five thousand dollars to get more diagnostic work done to make sure the other money is well-spent is not obviously a strange or bad idea, whereas paying $5k for a novel is outlandish.
That’s fighting the hypothetical.
I don’t see why you think that. You could argue it’s reference class tennis, but if your point is “people don’t do weird thing X” and in fact people do weird thing X in a slightly different context, then we need to reevaluate what is generating the weirdness. If people do actually spend thousands of dollars in order to read a book (and be credentialed for having read it), then a claim that you don’t want to spend for it becomes a statement about you instead of about people in general, or a statement about what features you find most relevant.
(I don’t know your educational history, but suppose I was having this conversation with an English major who voluntarily took college classes on reading books; clearly the class experience of discussing the book, or the pressure to read the book by Monday, is what they’re after in a deeper way than they were after reading the bookt. If they just cared about reading the book, they would just read the book.)
I am describing people, not repeating people’s self-descriptions, and since I am claiming that people are systematically mistaken about their self-descriptions, of course there should be a disagreement between them!
I’m complaining about your terminology. Terminology is about which meaning your words communicate. Being wrong about one’s self-description is about whether the meaning you intend to communicate by your words is accurate. These are not the same thing and you can easily get one of them wrong independently of the other.
I think you do care more about the price than about the reviews? That is...
The sentence after the “that is” is a nonstandard definition of “caring more about the price than about the reviews”.
That’s fighting the hypothetical.
I don’t see why you think that.
It’s fighting the hypothetical because the hypothetical is that I do not want to pay $5000 for a book. Pointing out that there are situations where people want to pay $5000 for a book disputes whether the situation laid out in the hypothetical actually happens. That’s fighting the hypothetical. Even if you’re correct, whether the situation described in the hypothetical can actually happen is irrelevant to the point the hypothetical is being used to make.
but if your point is “people don’t do weird thing X”
My point is not “people don’t do weird thing X”, my point is that people do not use the term X for the type of situation described in the hypothetical. A situation does not have to actually happen in order for people to use terms to describe it.
A good place to get started there is Epistemology and the Psychology of Human Judgment, summarized on LW by badger
Thanks, I’ll try to find the relevant parts.
This suggests that the marginal bit of medicine—i.e. the piece that people don’t consume, but would if it were cheaper or do consume but wouldn’t if it were more expensive—doesn’t have a net impact
I didn’t want to get in too in depth into this discussion, because I don’t actually disagree with the weak conclusion that a lot of people receive too much healthcare and that completely free healthcare is probably a bad idea. But Robin Hanson doesn’t stop there, he concludes that the rest of medicine is a sham and the fact that other studies show otherwise is a scandal. As to why I don’t buy this, the RAND experiment does not show that health outcomes do not improve. It shows that certain measured metrics do not show a statistically significant improvement on the whole population. In fact in the original paper, the risk of dying was decreased for the poor high risk group but not the entire population. Which brings up a more general problem—such a study is obviously going to be underpowered for any particular clinical question, and it isn’t capable of detecting benefits that lie outside of those metrics.
I’m not buying your elevator pitch. Primarily because lots of data is not nearly enough. You need smart people and, occasionally, very smart people. This means that
companies had access to tons of data that they could use to ACTUALLY make better decisions
is not true because they lack people smart enough to correctly process the data, interpret it, and arrive at the correct conclusions. And
the management consulting companies would come in as outsiders, charge a bunch of money, and use their clout to use the data to make big decisions
is also not quite true because companies like McKinsey and Bain actually look for and hire very smart people—again, it’s not just data. Besides, in a lot of cases external consultants are used as hatchet men to do things that are politically impossible for the insiders to do, that is, what matters is not their access to data but their status as outsiders.
there’s no objective way to tell which companies are actually good at making decisions
Sure there is—money. It’s not “pure” capitalism around here, but it is capitalism.
An objective metric(bayesian scoring rule) that shows how good an organization or individual is at predicting the future.
So, what’s wrong with the stock price as the metric?
Besides, evaluating forecasting capability is… difficult. Both theoretically (out of many possible futures only one gets realized) and practically (there is no incentive for people to give you hard predictions they make).
I don’t think that McKinsey’s and Bain’s business is crunching data. I think it is renting out smart people.
I was recently involved in a reasonably huge data mining & business intelligence task (that I probably should not disclose). I could say this was an eye-opener, but I am old enough to be cynical and disillusioned so that it was not a surprise.
First, we had some smart people in the team (shamelessly including myself :-), “smart” almost by definition means “experts in programming, sw development and enough mathematics and statistics) doing the sw implementation, data extraction and statistics. Then there were slightly less smart people, but experts in the domain being studied, that were supposed to make the sense of the results and write the report. These people were offloaded from the team, because they were very urgently needed for other projects.
Second, the company bought very expensive tool for data mining and statistical analysis, and subcontracted other company to extend it with necessary functionality. The tool did not work as expected, the subcontracted extension was late by 2 months (they finished it at the time the final report should have been made!) and it was buggy and did not work with the new version of the tool.
Third, it was quite clear that the report should be bent towards what the customer wants to hear (that is not to say it would contain fabricated data—just the interpretations should be more favourable).
So, those smart people spent their time in 1) implementing around bugs in the sw we were supposed to use, 2) writing ad-hoc statistical analysis sw to be able to do at least something, 3) analysing data in the domain they were not experts in, 4) writing the report.
After all this, the report was stellar, the customer extremely satisfied, the results solid, the reasoning compelling.
Had I not been involved and had I not known how much of the potential had been wasted and on how small fraction of the data the analysis had been performed, I would consider the final report to be a nice example of a clever, honest, top level business intelligence job.
So, those smart people spent their time in 1) implementing around bugs in the sw we were supposed to use, 2) writing ad-hoc statistical analysis sw to be able to do at least something, 3) analysing data in the domain they were not experts in, 4) writing the report.
After all this, the report was stellar, the customer extremely satisfied, the results solid, the reasoning compelling.
Had I not been involved and had I not known how much of the potential had been wasted and on how small fraction of the data the analysis had been performed, I would consider the final report to be a nice example of a clever, honest, top level business intelligence job.
So, this problem is NOT one I’m tackling directly (I’m more saying, how can they get smart people like you to make that cludge for much cheaper) but the model does indirectly incentivize better BI tools by creating competition directly in forecasting ability, and not just signaling ability.
To be frank, I didn’t expect you to based on our previous conversations on forecasting. You are too skeptical of it, and haven’t read some of the recent research on how effective it can be in a variety of situations.
is not true because they lack people smart enough to correctly process the data, interpret it, and arrive at the correct conclusions.
Exactly, this is the problem I’m solving.
So, what’s wrong with the stock price as the metric?
As I said, the signaling problem. Using previous performance as a metric means that there are lots of good forecasters out there who simply can’t get discovered—right now, it’s signaling all the way down (Top companies hire from top colleges, take from top highschools). Basically, I’m betting that there are lots of organizations and people out there who are good forecasters, but don’t have the right signals to prove it.
Besides, evaluating forecasting capability is… difficult. Both theoretically (out of many possible futures only one gets realized) and practically (there is no incentive for people to give you hard predictions they make).
You should read the linked article on prediction polls—they weren’t even paying people in Tetlock’s study (only giving giftcard gifts not at all comensurate to the work people are putting in) and they solved the problem to the point where they could beat prediction markets.
You are too skeptical of it, and haven’t read some of the recent research on how effective it can be in a variety of situations.
From my internal view I’m sceptical of it because I’m familiar with it :-/
it’s signaling all the way down (Top companies hire from top colleges, take from top highschools)
Um, hiring from top colleges is not quite all signaling. There is quite a gap between, say, an average Stanford undergrad and an average undergrad of some small backwater college.
You should read the linked article on prediction polls—they weren’t even paying people in Tetlock’s study
Um, I was one of Tetlocks’ forecasters for a year. I wasn’t terribly impressed, though. I think it’s a bit premature to declare that they “solved the problem”.
With people who claim to have awesome forecasting power or techniques, I tend to point at financial markets and ask why aren’t they filthy rich.
From my internal view I’m sceptical of it because I’m familiar with it :-/
You’re right, I was assuming things about you I shouldn’t have.
Um, hiring from top colleges is not quite all signaling. There is quite a gap between, say, an average Stanford undergrad and an average undergrad of some small backwater college.
Fair point. But the point is that they’re going on something like “the average undergrad” and discounting all the outliers. Especially problematic in this case because forecasting is an orthogonal skillset to what it takes to get into a top college.
With people who claim to have awesome forecasting power or techniques, I tend to point at financial markets and ask why aren’t they filthy rich.
Markets are one of the best forecasting tools we have, so beating them is hard. But using the market to get these types of questions answered is hard (liquidity issues in prediction markets) so another technique is needed.
Um, I was one of Tetlocks’ forecasters for a year. I wasn’t terribly impressed, though. I think it’s a bit premature to declare that they “solved the problem”.
What part specifically of that paper do you think was unimpressive?
Not necessarily. Recall that a slight shift in the mean of a normal distribution (e.g. IQ scores) results in strong domination in the tails.
Besides, searching for talent has costs. You’re much better off searching for talent at top tier schools than at no-name colleges hoping for a hidden gem.
using the market to get these types of questions answered is hard
What “types of questions” do you have in mind? And wouldn’t liquidity issues be fixed just by popularity?
forecasting is an orthogonal skillset to what it takes to get into a top college.
Let me propose IQ as a common cause leading to correlation. I don’t think the skillsets are orthogonal.
What part specifically of that paper do you think was unimpressive?
I read it a while ago and don’t remember enough to do a critique off the top of my head, sorry...
Besides, searching for talent has costs. You’re much better off searching for talent at top tier schools than at no-name colleges hoping for a hidden gem.
That’s the signalling issue—I’m trying to create a better signal so you don’t have to make that tradeoff
What “types of questions” do you have in mind? And wouldn’t liquidity issues be fixed just by popularity?
Question Example: “How many units will this product sell in Q1 2016?” (Where this product is something boring, like a brand of toilet paper)
This is a question that I don’t ever see being popular with the general public. If you only have a few experts in a prediction market, you don’t have enough liquidity to update your predictions. With prediction polls, that isn’t a problem.
Why do you call that “signaling”? A top-tier school has a real, actual, territory-level advantage over a backwater college. The undergrads there are different.
If you only have a few experts in a prediction market, you don’t have enough liquidity to update your predictions. With prediction polls, that isn’t a problem.
I don’t know about that not being a problem. Lack of information is lack of information. Pooling forecasts is not magical.
Why do you call that “signaling”? A top-tier school has a real, actual, territory-level advantage over a backwater college. The undergrads there are different.
Because you’re going by the signal (the college name), not the actual thing you’re measuring for (forecasting ability).
I don’t know about that not being a problem. Lack of information is lack of information. Pooling forecasts is not magical.
I meant a problem for frequent updates. Obviously, less participants will lead to less accurate forecasts—but by brier weighting and extremizing you can still get fairly decent results.
I actually believe that management consulting companies are paid to help companies make big decisions. I believe this because usually they are hired when a company needs to make a big decision.
Decision theory shows us that a huge portion of making big decisions is making accurate predictions about the future (and the other pieces, such as determining an accurate utility function, are best left to the organizations themselves).
Decision theory shows us that a huge portion of making big decisions is making accurate predictions about the future
Where does it show us that’s true?
More importantently how do you know that the customers of mangement consulting believe that’s true? Do you think that the average Fortune 500 CEO invests resources into internal prediction making in a way to indicate that he believes this is true?
I think if the average Fortune 500 CEO would believe this to be true you would have much more internal prediction markets in companies. Programs for internal prediction markets that are not sold based on team building efforts but that are sold on actually producing actionable data.
I mean, I’m convinced by the math. You are welcome to disagree with the math, but you’ll have to show me some other math that disproves everything that decision theorists have already figured out.
I think if the average Fortune 500 CEO would believe this to be true you would have much more internal prediction markets in companies. Programs for internal prediction markets that are not sold based on team building efforts but that are sold on actually producing actionable data.
We have different models here. In my model, Prediction Markets aren’t used because politics are set up for people who can make excuses—prediction markets would remove the ability of those people to make excuses, so the political factions don’t allow them. Management consulting firms solve this by coming in as an outsider endorsed by the fortune 500CEO (therefore bypassing most of the politics) and making those predictions themselves. I’m just trying to bring down the cost of these outsiders, so that the CEO can use them for many more decisions.
I mean, I’m convinced by the math. You are welcome to disagree with the math, but you’ll have to show me some other math that disproves everything that decision theorists have already figured out.
The math depends heavily on the axioms that you use. It’s quite easy to choose axioms in a way that you get the outcome you are looking for. The question is whether those axioms are warrented.
prediction markets would remove the ability of those people to make excuses, so the political factions don’t allow them
Why can’t the CEO order prediction markets to be created? Do you think the political factions wouldn’t create markets if ordered to do so?
The math depends heavily on the axioms that you use. It’s quite easy to choose axioms in a way that you get the outcome you are looking for. The question is whether those axioms are warrented.
As I said, you’re welcome to show me some axioms that show that forecasting is NOT a huge part of making big decisions.
Why can’t the CEO order prediction markets to be created? Do you think the political factions wouldn’t create markets if ordered to do so?
Because good CEO’s understand that buy-in is essential for any project. You can order projects all day and alienate your workforce, but that’s not how the fortune 500 CEOs got to be fortune 500 CEOs
As I said, you’re welcome to show me some axioms that show that forecasting is NOT a huge part of making big decisions.
The general idea is that big decisions get in most contexts made by experts via informed intuition and not by shutting up and calculating.
The math at which you are looking is shut up and calculate math.
Because good CEO’s understand that buy-in is essential for any project.
Do you think people get substantially more alienated if the CEO says: Let’s do an internal prediction market then when he transfers the same power to management consultants? Especially when the consultants are suddenly forced by your system to not make politically acceptable suggestions but focus on true predictions?
The general idea is that big decisions get in most contexts made by experts via informed intuition and not by shutting up and calculating. The math at which you are looking is shut up and calculate math.
There’s substantial room for both in prediction polls.
Do you think people get substantially more alienated if the CEO says: Let’s do an internal prediction market then when he transfers the same power to management consultants?
The alienation doesn’t tank the project because it’s not being run by the people being alienated.
Isn’t this a niche filled by ‘business intelligence’ and ‘data science’? They call it a lot of different things, sure, but they seem to be operating in the same space- at least, they may seem to, to a non-technical executive. An exception is mid-to-small business—I don’t think there’s a lot of penetration there.
In practice, most companies with BI dashboards and data science analytics experience more information overload than before, because they don’t have the human capital to make sense of all that information.
There are limited cases (e.g. weather reporting and website split testing) where the niche is narrow enough that the computer can basically do everything on it’s own, but computers aren’t at the point yet (and likely won’t be for a long time) where they can use generic data to make complex decisions.
GiveWell already uses expert advice for expedient impact assessments. Albeit on a small scale, without using academic- know how and with suboptimal choice and choice architecture of their experts. Hope you can improve on it :)
You’ve picked the wrong problem domain for the scoring rules. Briar comes from probability assessment, there are already more sophisticated approaches to this problem several levels removed from the mathematical theory and synthesising several theoreums.
The most proximate implementations of what you are suggesting are either delphi groups (risk analysis) or prediction markets (rationalist subculture mainly, but also academic). You probably already know how prediction markets work and you can look up ‘expert elicitation’ or ‘eliciting expert judgement’ and similar terms if you’re interested. Happy to answer any tougher questions you can’t get answered.
There are structured approaches to delphi groups which incorporate bayes rules and insights around the psychology of eliciting and structuring expert judgement that you could mimic. There is at least one major corporate consultancy focused on this already, however. AFAIK there are no implementations of this kind in the blockchain. Whether that is a worthwhile competitive advantage is another question.
You have a strategic mindset, I like it. If I’ve interpreted your question accurately, the reason other’s in the know may not have responded is the xy problem.
There are structured approaches to delphi groups which incorporate bayes rules and insights around the psychology of eliciting and structuring expert judgement that you could mimic.
Yes, the technology I’m using (prediction polls) are essentially this. It’s Delphi groups weighted by Brier scores. The paper I link to above compares them to a prediction market with the same questions—with proper extremizing algorithms, the prediction poll actually does better (especially early on).
The reason I came up with this solution is that I wanted to use prediction markets for a specific class of impact assesments, but they weren’t suited for the task. Prediction markets require either a group of interested suckers to take the bad bets, or a market maker who is sufficiently interested in the outcome to be willing to take the bad side on ALL the sucker bets. My solution complements prediction markets by being much better in those cases by avoiding the zero sum game, and instead just directly paying experts for their expertise.
I’m interested in talking to people knowledgeable in decision theory/bayesian statistics about a startup that aims to disrupt the $240,000,000,000 management consulting market. it’s based on the idea of prediction polls, but done on the blockchain(the same thing bitcoin uses) in a completely decentralized way.
I’m particularly interested in people who can help me out with understanding/choosing alternative scoring rules besides Brier scoring.
I can’t pay you for your time, but I can virtually order you a pizza or buy you a beer :).
edit: Here’s the (still very rough) elevator pitch:
For a long time companies relied on a pretty fuzzy metric: People who seemed to be better at making good decisions got to make them. This worked out decently well, but led to one undesirable result: People who were good at making excuses about their decisions ALSO got to make decisions.
The thing was, we didn’t really have a better way to do it. That is, until the data revolution. Suddenly, companies had access to tons of data that they could use to ACTUALLY make better decisions. The problem was, they weren’t politically set up to make use of this data, because all the people in power were those who could make good excuses.
This is were management consulting companies came in. For really big decisions, the management consulting companies would come in as outsiders, charge a bunch of money, and use their clout to use the data to make big decisions (like how many people to fire). This industry rapidly grew to the 240 billion dollar industry it is today.
But there’s a huge problem with the industry—there’s no objective way to tell which companies are actually good at making decisions. This leads to a case where the only way to tell which companies are good is their name and reputation—which means a monopolistic signalling market where the very few who got in early and made a name for themselves get to overcharge for their name, and new cheaper players find it very hard to enter the market.
The solution: An objective metric(bayesian scoring rule) that shows how good an organization or individual is at predicting the future. The entire history of how the company got this score is available on the blockchain, so you avoid the signaling problem by making everything auditable and therefore not having to put your trust in any one brand or company.
Not only can this allow us to take over all the big problems that management consulting currently handles, but it opens up a whole class of smaller decisions that were simply cost prohibitive in the management consulting model, and creates a new paradigm for management as a result.
Edit 2: If you’re effectively altruist minded, it may be of interest to know that the reason I’m interested in doing this is to drastically reduce the cost of impact assessments.
You seem to assume that the management consulting companies are paid for making the correct decision based on the data… as opposed to giving the answer someone important in the management (the person who made the decision to hire them) wanted to hear, while providing this person plausible deniability (“it wasn’t my idea; it’s what the world-renown experts told us to do; are you going to doubt them?”).
Depending on which view is correct, there may or may not be a market demand for your solution.
I’ve seen this cynical viewpoint before. Honest question—what do you know about management consulting? What specific management consulting decisions are you basing this theory off of and how common are they? And how much of consulting consists of much more boring activities like developing new supply chains and inventory systems, rather than Machiavellian strategizing?
I have no direct experience with management consulting.
My opinions are formed by: my own observations of office politics; reading Dilbert; reading Robin Hanson; listening to stories of my friend who is an IT consultant. But I trust the other sources because they are compatible with what I observe.
Maybe it depends on a company, and maybe the one where I work now is an unually dysfunctional one (or maybe I just have better information channels and pay better attention), but most management decisions are completely idiotic. What the managers are good at optimizing for, is keeping their jobs. Even that is not done by making sure the projects succeed, but rather by destroying internal competitors.
For example, one of our managers was fired because our IT support department was actively sabotaging our project for a few months and we had no budget to seek help elsewhere; so we missed a few deadlines because we even had no servers functioning, and then the guy was fired for incompetence. The new manager is a good friend with the IT support manager, so when he got his role, our IT support department stopped actively sabotating us. This was all he ever did for us; otherwise he almost completely ignores the project. He is praised as a competent leader, because now we succeeded to catch up with the schedule. That was mostly because of the hard work of our three most competent developers. One of them was recently fired, because we had too many people on the team. And that’s because the new manager also brought a few developers from his old team; they do absolutely nothing, officially because they are experts on a different programming language, but we secretly suspect they actually don’t even know programming, so they don’t contribute and mostly don’t even go to work, but now they are part of our budget, so someone else had to go. Why not pick randomly?
How is it possible that such systems survive? My explanation is that nerds are really bad at playing power games (actually so bad that they don’t even realize that such games exist or need to be played; they may even object violently, which makes them really bad allies in such games, which is why no one will even try to ally with them and educate them). Instead our weakness is the eternal childish desire to be praised by a parent figure for being smart. So whenever shit hits the fan, the developers will work extra hard to fix the problem—without even thinking about using that as a leverage to gain more power in the organization. Most nerds are too shy to ask for a pay raise, even if they have just saved the management’s collective asses. So the managers can afford to ignore the technical aspects completely as something that happens automatically at a constant cost, and can focus fully on their own internal fights.
A few months ago I was ‘jokingly’ trying to get my colleagues to expore the idea of what could happen if the developers decided to form a union. How the management would be completely at our mercy, because they don’t understand anything, are unable to hire a replacement quickly (it took them forever to find and hire us), and with the tight schedule any waste of time would totally sink the project. Even if we would use this power for goals compatible with the company goals, we could negotiate to remove a lot of inefficiency and improve our working conditions. But we could also all ask for a raise, and for the company this whole revolution would still be profitable. -- My colleagues listen to me, mostly agreed with some conclusions on an abstract level, and then laughed because it was obviously such a silly idea. They all live in the imaginary universe where your income and quality of life is directly proportional to your coding skills and nothing else matters. I was screaming internally, but I politely laughed with them. Now some of them are being fired, regardless of their competence and hard work, and more will follow. Har har. I don’t worry about them too much; it will be easy to find another job. But it will be more or less the same thing, over and over again. They had an opportunity for something better and they ignored it completely. Worst case, if the plan would backfire, they would be in the same situation they are now.
As Plato said, one of the penalties for refusing to participate in politics is that you end up being governed by your inferiors. That’s IT business in a nutshell.
That was a very entertaining read thanks.
It is also possible that you aren’t aware of most of what your management does. I’ll take your word for it that many of their decisions that are visible to you are poor (maybe most of their decision are, but I’m not yet convinced). As for management consulting, I suppose that is an inferential gap that is going to be hard to bridge.
The implication of my story for management consulting is: if this company (assuming that I have described it correctly) would ever hire a management consulting company, why would they decide to do it, how would they choose the specific company, what task would they give to the company, and how would they use the results?
My model says that they wouldn’t hire the management consulting company unless as a move in some internal power struggle; the choice would most likely be done on basis of “some important person’s friend works for the consulting company or recommended the company”; they would give the company a completely false description of our organization and would choose the most uninformed and incompetent people as speakers (for example, they might choose one of those ‘programmers’ who doesn’t contribute to our project as the person who will describe the project to the consultants); and whatever reports the consulting company would give to us, our management would completely reinterpret them to fit their existing beliefs.
In other words, I have no direct information about the management consulting companies, but I have a model of their customers; and that models says that in the market for management consulting the actual quality of the advice is irrelevant. (Unless companies like this are a minority on the market.)
The upper echelons don’t invite me to their meetings, so there is always a chance. But when I tried to socialize with some of the lower managers, the story is usually that the higher managers mostly sabotage their work by “hit-and-run management”. It works like this: the higher manager knows nothing about the project and most of the time doesn’t even care. Suddenly they become interested in some detail (e.g. something got wrong and the customer complained to them, or they just randomly heard something and decided to “be useful”). So they come and start micromanaging to optimize for that detail, completely ignoring all the context. Sometimes they contribute to improve the detail, sometimes the detail would be fixed in exactly the same time even without their contribution; but in the process they usually do harm to all other things.
For example, imagine that there are five known minor bugs in the program, and there are five programmers working on them. Under usual circumstances, all five bugs would be solved in a day, one bug per programmer. But the customer complained about one of the bugs on the phone, so the big boss comes and makes everyone work on that one bug. So at the end of the day, only one of the five bugs is fixed, and the big boss leaves, feeling victorious. (From his point of view the story probably reads: “Without my oversight, this department produced a software with an error that bothered the customer, but thanks to my heroic action, the whole problem was solved in a single day. Yay for being agenty!”) Meanwhile, the things that would allow us detect and fix the bugs more reliably before they even get to the customer, such as automated testing or even using written test scenarios, are ignored for years (this is not an exaggeration), no matter how often the programmers complain about that on meetings.
Another issue is that the managers never cross-check the information they get. That makes “being the first one who has an opportunity to tell managers their version of the story” critical. For example, we need some work done from the IT support department. The support does step 1 of 10, then reports to managers “it’s done” and stops working on the issue. The managers are happy, the programmers keep waiting… a few months later the topic gets mentioned at the meeting, the manager is like: “so, you guys were happy to have this problem solved so quickly, right?”, the programmers are like: “wtf, we keep waiting for months”, the manager is: “wtf, the problem was already solved months ago”, the programmers: “no way!”, the manager: “okay, let’s call the support on the phone”, the support: “sure, the problem was solved months ago… oh yeah, yeah… well, it wasn’t solved completely, there are still a few details missing (such as steps 2 to 10), but the programmers never complained about that so we thought that way okay”, the manager: “guys, seriously, why don’t you communicate more clearly”, the programmers: “well, the steps 1 to 10 were clearly described in the specification we had to write for the support department”, the support: “well, we were not sure you really needed that”… And the next time again we need something, the support again mostly ignores it and reports the work as done, and no manager bothers to verify. Similarly when we get incomplete specifications, etc. Many people in the company use this opportunity to not do their work, report it as done, and later use some half-assed excuse. Only the programmers have to write the code, otherwise the customer would complain. Anyone else only generates internal complaints, which are not taken seriously by the management.
Somewhat related to this: imagine that you would manage a project where three employees, A, B, C, each have to do one aspect of the project, and the next one can start their job only after the previous one has finished. For example: specification, programming, testing. And you would have 10 days to deliver the results to the customer. Well, I would certainly create internal deadlines, e.g. A must deliver their part on day 3, B must deliver their part on day 6, C must deliver their part on day 9, and there is one day reserve. If on the day 4 the employee A tells me he is not ready and will probably not complete it even today, I would treat that as an impending crisis; because every delay caused by A means less time for B and C. -- Instead, our managers simply write into their private calendars that on day 10 we need to deliver the product to the customer, and they feel their work is done. The person A usually takes 8 days to do their part, and even then they often give an incomplete work to the B, who will work like crazy the remaining 2 days, and the part of C is usually skipped (C is testing, this is why we then have so many bugs reported by the customer). The managers start being active on day 10, usually when they return from lunch, and start reminding everyone that today is the critical day we need to deliver the product to the customer. The employee A has their work already finished, so there is no pressure on them; all pressure goes to B and C. And this keeps happening again and again, every few weeks, for years. If you try to speak with the managers about it, they tell you “yes, we are aware of the problem, and we are working on solving it”. Just like they told you a year ago.
Another failure mode typical for our company is the following: there is some work X that needs to be done, but none of our employees is an expert on X, and we can’t solve the problem using google (also we have other work to do). We keep reminding the management that we would need some expert on X; either a new employee, or at least an external consultant that would spend a day or two working with us. There are two ways this can end. Option 1 -- a year or two later the management finally tells us they will invite the external expert, but only for an hour or two, because the expert is very expensive. We keep waiting. A week or two later we are told that the expert already was here. “Really? Who did he talk with?” No one knows, but after another week we find out it was someone irrelevant who knows nothing about our project or about X. “So what did he ask the expert?” Most likely, it was something different that either doesn’t apply to our project, or is so simple that we could have answered that ourselves. “So what did the expert answer?” Sorry, we forgot. Nope, no one took notes. Then the management says: “Okay guys, we already did what you wanted, now please stop making excuses and finally do the work we were supposed to deliver to the customer a year ago.” Option 2 -- a year or two later an expert on X is hired. Everyone in the team celebrates. However, the next day the person is given a task to work on some completely unrelated Y. Why? For some reason management suddenly believes it is the highest priority of the day, although it is something we could have solved without the new guy. So the new guy works on Y and doesn’t have time for X. Then the new guy is told to work on Z, et cetera. A few months later the new guy is annoyed and quits, because he wants to specialize on X, but he was given no time to do that here; so our problems remain unsolved. Then the management says: “Okay guys, we had an expert on X here, now please stop making excuses and complete the work.”
Eh, I could go on like this for days. The point is, I don’t believe there is some higher wisdom there. Other than the fact that we get government projects because of political connections, so the actual quality of our product is irrelevant as long as it works and is completed more or less on time; and even that is often a problem.
This question is something that keeps me up at night.
In the long term, I’m confident that if the latter case is true, my solution will (eventually) outcompete anyone using mangement consultants. Because of the blockchain based business model, this is a possibility that the company (in the loosest sense of the word) can handle. This would be worst case scenario.
“The market can stay irrational longer than you can stay solvent.”
That’s not how the blockchain works—once the app is there, it exists forever( at least as long as other apps are using that same blockchain), and it can limp along as long as it needs to until the market catches up. It’s one of the key reasons I chose the business model I did (which allows investors to make money from the app being succesful, no matter whether that’s from an application of the protocol IM using, or someone else)
That’s a predominately satirical (sometimes conspiracy theory..). I hope you’re using it that way and aren’t just ignorant...
You’re selling the efficacy of your implementation to firms for making correct decisions. The perception of correct decisions, or potential thereof is important.
Unless you also consider Dilbert to be a conspiracy theorist...
People are often optimizing for their own goals, instead of the goals of the organization they are working for. People are stupid. People are running on a corrupted hardware. Put these three facts together, and you will see organizations where managers sometimes make genuinely bad decisions, and sometimes they make decisions that help them but harm the company; and they will of course deny doing this, and sometimes they are simply lying, but sometimes they honestly believe it.
Often the quality of a decision is hard to measure, or perhaps just too easy to rationalize either way. When a manager makes a bad decision, it does not mean that the project will fail. Sometimes the employees will work harder or take overtime to fix the problems. Sometimes the company is lucky because their customer is even more dysfunctional than them, so they won’t notice how faulty and overpriced is the delivered product. When a manager makes good decisions, it does not mean that the project will succeed. Sometimes other managers that are supposed to cooperate with the department will sabotage the project to get rid of an internal competitor. Sometimes there are unpredictable forces outside, for example a new competing product appears on the market, and it happens to be better and cheaper, or just has better marketing. -- And of course, when a project succeeds, the typical manager will attribute it to their own smart decisions, and when a project fails, there will always be someone or something else to blame. It’s like in politics.
I know about scoring rules and probability assessments. Email me and we’ll set up a time to talk.
Similar to Viliam in a sibling comment, I think that this is the sort of idea that would work in the ideal world but not the real world. To channel Hanson, “Consulting is not about advice,” and thus a product that seeks to disrupt consulting by providing superior advice will simply fail. (Compare to MetaMed, which tried to disrupt medicine by providing superior diagnostics. Medicine is not about healing!)
This is something that I would be interested in reading, so I think I found the link in case anyone else is interested.
Side story, I once did a case study phone interview with a consulting firm, using a real world example of one of their clients, a major credit card company. They were tasked with finding ways to increase revenue. Without any background information I gave them a bunch of wacky out of the box and on the spot answers. I asked them what the real answer was.
The answer? Need more customers. That’s 500k over the course of a month for 6 MBAs. But the client gets a 30 page PDF with words on it documenting their finds so shrug
That link goes to lesswrong.com… is there another link?
Here are some articles from Overcoming Bias that seem relevant:
Consulting Isn’t About Advice
Too Much Consulting?
Status As Strength
Freakonomics On Consulting
Some quotes:
Thanks, these are really helpful!
I think this is the most legitimate objection to the entire model as stated. To be fair, the reason I want this out there is NOT just to disrupt consulting—my investors asked me for the best mass market play, and this was it. I have other uses for a cheap accurate forecasting tool that don’t involve Fortune 500 companies :).
edit: I do think that no matter what happens, a tool like this will eventually come to dominate because it’ s just better—but it may take new companies out-competing other companies using the tool, which is much slower than convincing existing companies to switch from management consulting.
I’ll send you an email now.
I’d love to hear this expanded on. On the surface this comment pattern matches to the sort of low quality anti-establishment attitude that is common around here, so I’m surprised to see you write it.
Three main sources. (But first the disclaimer About Isn’t About You seems relevant—that is, even if medicine is all a sham (which I don’t believe), participating in the medical system isn’t necessarily a black mark on you personally.)
First is Robin Hanson’s summary on the literature on health economics. The medicine tag on Robin’s blog has a lot, but a good place to start is probably Cut Medicine in Half and Medicine as Scandal followed by Farm and Pet Medicine and Dog vs. Cat Medicine. To summarize it shortly, it looks like medical spending is driven by demand effects (we care so we spend to show we care) rather than supply effects (medicine is better so we consume more) or efficacy (we don’t keep good records of how effective various doctors are). His proposal for how to fund medicine shows what he thinks a more sane system would look like. (As ‘cut medicine in half’ suggests, he doesn’t think the average medical spending has a non-positive effect, but that the marginal medical spending does, to a very deep degree.)
Second is the efficiency literature on medicine. This is statisticians and efficiency experts and so on trying to apply standard industrial techniques to medicine and getting pushback that looks ludicrous to me. For example, human diagnosticians perform at the level or worse than simple algorithms (I’m talking linear regressions, here, not even neural networks or decision trees or so on), and this has been known in the efficiency literature for well over fifty years. Only in rare cases does this actually get implemented in practice (for example, a flowchart for dealing with heart attacks in emergency rooms was popularized a few years back and seems to have had widespread acceptance). It’s kind of horrifying to realize that our society is smarter about, say, streamlining the production of cars than we are streamlining the production of health, especially given the truly horrifying scale of medical errors. Stories like Semmelweis and the difficulty getting doctors to wash their hands between patients further expand this view.
Third is from ‘the other side’; my father was a pastor and thus spent quite some time with dying people and their families. His experience, which is echoed by Yvain in Who By Very Slow Decay and seems to be the common opinion among end-of-life professionals in general, is that the person receiving end-of-life care generally doesn’t want it and would rather die in peace, and the people around them insist that they get it (mostly so that they don’t seem heartless). As Yvain puts it:
Once you really grok that a huge amount of medical spending is useless torture, and if you are familiar with what it looks like to design a system to achieve an end, it becomes impossible to see the point of our medical system as healing people.
[edit]And look at today’s Hanson post!
I broadly differ with the hansonian take on medicine. I think metamed failed not because it offered more effective healing but went bust because medicine doesn’t really demand healing; but rather that medicine is about healing, generally does this pretty well, and Metamed was unable to provide a significant edge in performance over standard medicine. (I should note I am a doctor, albeit a somewhat contrarian one. I wrote the 80k careers guide on medicine).
I think medicine is generally less fertile ground for hansonian signalling accounts, principally because health is so important for our life and happiness we’re less willing to sacrifice it to preserve face (I’d wager it is an even better tax on bs than money). If the efficacy of marginal health spending is near zero in rich countries, that seems evidence in support of, ‘medicine is really about healing’ - we want to live healthily so much we chase the returns curve all the way to zero!
There are all manner of ways in which western world medicine does badly, but I think sometimes the faults are overblown, and the remainder are best explained by human failings rather than medicine being a sham practice:
1) My understanding of the algorithms for diagnosis is that although linear regressions and simple methods can beat humans at very precise diagnostic questions (e.g. ’Given these factors of a patient who is mentally ill, what is their likelihood of committing suicide?), humans still have better performance in messier (and more realistic) situations. It’d be surprising for IBM to unleash Watson on a very particular aspect of medicine (therapeutic choice in oncology) if simple methods could beat doctors across most of the board.
(I’d be very interested to see primary sources if my conviction is mistaken)
2) Medicine has become steadily more and more protocolized, and clinical decision rules, standard operating procedures and standards of care are proliferating rapidly. I agree this should have happened sooner: that Atul Gwande’s surgical checklist happened within living memory is amazing, but it is catching on, and (mildly against hansonian explanations) has been propelled by better outcomes.
I can’t speak for the US, but there are clear protocols in the UK about initial emergency management of heart attacks. Indeed, take a gander at the UK’s ‘NICE Pathways’ which gives a flow chart on how to act in all circumstances where a heart attack is suspected.
3) I agree that the lack of efficacy information about individual doctors isn’t great. Reliable data on this is far from trivial to acquire however, and that with doctors understandable self-interest not to be too closely monitored seems to explain this lacuna as well as the hansonian story. (Patients tend to want to know this information if it is available, which doesn’t fit well with them colluding with their doctors and family in a medical ritual unconnected to their survival).
4) Over-treatment is rife, but the US is generally held up as an anti-examplar of this fault, and (at least judging by the anecdotes) medics in the UK are better (albeit still far from perfect) at flogging the patient to death with medical torture. Outside of this zero or negative margin, performance is better: it is unclear how much is attributable to medicine, but life expectancy, disease free life expectancy, and age-standardized mortality rates for most conditions are declining.
Now, why Metamed failed (I appreciate one should get basically no credit for predicting a start up will fail given this is the usual outcome, but I called it a long time ago):
Metamed’s business model relied on there being a lot of low hanging fruit to pluck. That in many cases, a diagnosis or treatment would elude the clinician because they weren’t appraised of the most recent evidence, were only able to deal in generalities rather than personalized recommendations, or that they just were less adept at synthesizing the evidence available.
If it were Metamed versus the average doctor—the one who spends next-to-no time reading academic papers, who is incredibly busy, stressed out, and so on, you’d be forgiven for thinking that metamed has an edge. However, medics (especially generalists) have long realized they have no hope of keeping abreast of a large medical literature on their own. Enter division of labour: they instead commission the relevant experts to survey, aggregate and summarize the current state of the evidence base, leaving them the simpler task of applying in their practice. To make sure it was up to date, they’d commission the experts to repeat this fairly often.
I mentioned NICE (National Institute of Clinical Excellence) earlier. They’re a body in the UK who are responsible (inter alia) for deciding when drugs and treatments get funded on the NHS. They spend a vast amount of time on evidence synthesis and meta-analysis. To see what sort of work this produces google ‘NICE {condition}’. An example for depression is here. Although I think the UK is world leading in this aspect, there are similar bodies in similar countries in other countries, as well as commercial organizations (e.g. Uptodate.)
Against this, Metamed never had any edge: they didn’t have groups of subject matter experts to call upon for each condition or treatment in question, nor (despite a lot of mathsy inclination amongst them) did they by and large have parity in terms of meta-analysis, evidence synthesis and related skills. They were also outmatched in terms of quantity of man hours that could be deployed, and the great headstart NICE et al. already had. When their website was still up I looked at some of their example reports, and my view was they were significantly inferior to what you could get via NICE (for free!) or Uptodate or similar services for their lower fees.
MEtamed might have had a hope if in the course of producing these general evidence summaries, a lot of fine-grained data was being aggregated out to produce something ‘one size fits all’ - their edge would be going back to the original data to find out that although generally drug X is good for a condition, in ones particular case in virtue of age, genotype, or whatever else, drug Y is superior.
However, this data by and large does not exist: much of medicine is still at the stage of working out whether something works generally, rather than delving into differential response and efficacy. It is not clear it ever will—humans might be sufficiently similar to one another that for almost all of them one treatment will be the best. The general success of increasing protocolization in medicine is some further weak evidence of this point.
I generally adduce meta-med as an example of rationalist overconfidence. That insurgent Bayesians can just trounce relevant professionals in terms of what they purport to do thanks to signalling etc. But again, given the expectation was for it to fail (as most start ups do), this doesn’t provide evidence. If it had succeeded, I’d have updated much more strongly in the magic of rationalism meaning you can win and the world being generally dysfunctional.
Formatting note: the brackets for links are greedy, so you need to escape them with a \ to avoid a long link.
[Testing] a long link
[Testing] a short link
I agree that I expect people to be more willing to trade money for face than health for face. I think the system is slanted too heavily towards face, though.
I should also point out that this is mostly a demand side problem. If it were only a supply side problem, MetaMed could have won, but it’s not—people are interested in face more than they’re interested in health (see the example of the outdated brochure that was missing the key medical information, but looked like how a medical brochure is supposed to look).
My understanding is that this is correct for the simple techniques, but incorrect for the complicated techniques. That is, you’re right that a single linear regression can’t replace a GP but a NLP engine plus a twenty questions bot plus a causal network probably could. (I unfortunately don’t have any primary sources at hand; medical diagnostics is an interest but most of the academic citations I know are all machine diagnostics, since that’s what my research was in.)
I should also mention that, from the ML side, the technical innovation of Watson is in the NLP engine. That is, a patient could type English into a keyboard and Watson would mostly understand what they’re saying, instead of needing a nurse or doctor to translate the English into the format needed by the diagnostic tool. The main challenge with uptake of the simple techniques historically was that they only did the final computation, but most of the work in diagnostics is collecting the information from the patient. And so if the physcian is 78% accurate and the linear regression is 80% accurate, is it really worth running the numbers for those extra 2%?
From a business standpoint, I think it’s obvious why IBM is moving slowly; just like with self-driving cars, the hard problems are primarily legal and social, not technical. Even if Watson has half the error rate of a normal doctor, the legal liability status is very different, just like a self-driving car that has half the error rate of a human driver would result in more lawsuits for the manufacturer, not less. As well, if the end goal is to replace doctors, the right way to do that is imperceptibly hand more and more work over to the machines, not to jump out of the gate with a “screw you, humans!”
So, just like the Hansonian view of Effective Altruism is that it replaces Pretending to Try not with Actually Trying but with Pretending to Actually Try, if there is sufficient pressure to pretend to care about outcomes then we should expect people to move towards better outcomes as their pretending has nonzero effort.
But I think you can look at the historical spread of anesthesia vs. the historical spread of antiseptics to get a sense of the relative importance of physician convenience and patient outcomes. (This is, I think, a point brought up by Gawande.)
I think I agree with your observations about MetaMed’s competition but not necessarily about your interpretation. That is, MetaMed could have easily failed for both the reasons that its competition was strong and that its customers weren’t willing to pay for its services. I put more weight on the latter because the experience that MetaMed reported was mostly not “X doesn’t want to pay $5k for what they can get for free from NICE” but “X agrees that this is worth $100k to them, but would like to only pay me $5k for it.” (This could easily be a selection effect issue, where everyone who would choose NICE instead is silent about it.)
This is why I’m most optimistic about machine medicine, because it basically means instead of going to a doctor (who is tired / stressed / went to medical school twenty years ago and only sort of keeps up) you go to the interactive NICE protocol bot, which asks you questions / looks at your SNPs and tracked weight/heart rate/steps/sleep/etc. data / calls in a nurse or technician to investigate a specific issue, diagnoses the issue and prescribes treatment, then follows up and adjusts its treatment outcome expectations accordingly.
(Sorry for delay, and thanks for the formatting note.)
My knowledge is not very up to date re. machine medicine, but I did get to play with some of the commercially available systems, and I wasn’t hugely impressed. There may be a lot more impressive results yet to be released commercially but (appealing back to my priors) I think I would have heard of it as it would be a gamechanger for global health. Also, if fairly advanced knowledge work of primary care can be done by computer, I’d expect a lot of jobs without the protective features of medicine to be automated.
I agree that machine medicine along the lines you suggest will be superior to human performance, and I anticipate this to be achieved (even if I am right and it hasn’t already happened) fairly soon. I think medicine will survive less by the cognitive skill required, but rather though technical facility and social interactions, where machines comparably lag (of course, I anticipate they will steadily get better at this too).
I grant a hansonian account can accomodate this sort of ‘guided by efficacy’ data I suggest by ‘pretending to actually try’ considerations, but I would suggest this almost becomes an epicycle: any data which supports medicine being about healing can be explained away by the claim that they’re only pretending to be about healing as a circuitous route to signalling. I would say the general ethos of medicine (EBM, profileration of trials) looks like pro tanto reasons in favour about being about healing, and divergence from this (e.g. what happened to semmelweis, other lags) is better explained by doctors being imperfect and selfish, and patients irrational, rather than both parties adeptly following a signalling account.
But I struggle to see what evidence could neatly distinguish between these cases. If you have an idea, I’d be keen to hear it. :)
I agree with the selection worry re. Metamed’s customers: they also are assumedly selected from people who modern medicine didn’t help, which may also have some effects (not to mention making Metameds task harder, as their pool will be harder to treat than unselected-for-failure cases who see the doctor ‘first line’). I’d also (with all respect meant to the staff of Metamed) suggest staff of Metamed may not be the most objective sources of why it failed: I’d guess people would prefer to say their startups failed because of the market or product market fit, rather than ‘actually, our product was straight worse than our competitors’.
I’m not sure there’s much of a difference between the “doctors care about healing, but run into imperfection and seflishness” interpretation and the “doctors optimize for signalling, but that requires some healing as a side effect” interpretation besides which piece goes before the ‘but’ and which piece goes after.
The main difference I do see is that if ‘selfishness’ means ‘status’ then we might see different defection than if ‘selfishness’ means ‘greed.’ I’m not sure there’s enough difference between them for a clear comparison to be made, though. Greedy doctors will push for patients to do costly but unnecessary procedures, but status-seeking doctors will also push for patients to do costly but unnecessary procedures because it makes them seem more important and necessary.
There’s also a metamed cofounder making the same case for their failure, here: https://thezvi.wordpress.com/2015/06/30/the-thing-and-the-symbolic-representation-of-the-thing/
Thanks for the detailed reply.
Regarding arguments that the allocation of medical resources, particularly in the U.S. are wasteful and harmful in many cases—I agree in general, though the specifics are messy, and I don’t find Robin’s posts on the matter very well argued*. I’m most interested in this bit:
Particularly since your initial claim that had me raising eyebrows was that MetaMed failed because they have great diagnostics, but medicine doesn’t want good diagnostics.
Edit: *In the RAND post he argues that lower co-pays in a well insured population resulted in no marginal benefit of health (I’m unconvinced by this but I’d rather not go there), therefore the fact that most studies show a positive effect of medicine is a sham. I’m not sure if he thinks that statins and insulin are a scam but this is a bold and unjustified conclusion. The RAND experiment is not equipped to evaluate the overall healthcare effects of medicine, and that was not its main purpose—it was for examining healthcare utilization. The specific health effects of common interventions are known by studying them directly, and getting patients to follow the treatment protocols that get those results is, as far as I know, an unsolved problem.
There’s also a metamed cofounder making the same case, here: https://thezvi.wordpress.com/2015/06/30/the-thing-and-the-symbolic-representation-of-the-thing/
A good place to get started there is Epistemology and the Psychology of Human Judgment, summarized on LW by badger.
Ah, that’s a slightly broader claim than the one I wanted to make. MetaMed, especially early on, optimized for diagnostics and very little else, and so ran into problems like “why is the report I paid $5,000 for so poorly typeset?”. So it’s not that medicine / patients wants bad diagnostics ceteris paribus, but that the tradeoffs they make between the various features of medical care make it clear that healing isn’t the primary goal.
As I understand it, the study measured health outcomes at the beginning and end of the study, as well as utilization during the study. The group with lower copays consumed much more medicine than the group with higher copays, but was no healthier. This suggests that the marginal bit of medicine—i.e. the piece that people don’t consume, but would if it were cheaper or do consume but wouldn’t if it were more expensive—doesn’t have a net impact. (Anything that it would do to help is countered by the risks of interacting with the medical system, say.)
I think I should also make it clear that there’s a difference between medicine, the attempt to heal people, and Medicine, the part of our economy devoted to such, just like there’s a distinction between science and Science. One could make a similar claim that Science Isn’t About Discovery, for example, which would seem strange if one is only thinking about “the attempt to gain knowledge” instead of the actual academia-government-industry-journal-conference system. Most of Robin’s work is on medical spending specifically, i.e. medicine as actually practiced instead of how it could be practiced.
“People evaluated this report solely using non-medical considerations” is not the same as “medical considerations aren’t the primary goal” in the way that is normally understood. The non-medical consdierations serve as a filter.
I want to read a book with a good story (let’s call that a good book). However, I don’t want to read a good book that will cost me $5000 to read. By your definition, that means that my primary goal is not to read a good book, my primary goal is to read a cheap enough book.
That is not how most people use the phrase “primary goal”.
Which suggests to me that those are the primary goal. Now, you might say “but most people are homo hypocritus, not homo economicus, so ‘primary goal’ should mean ‘stated goal’ instead of ‘actual goal’. And if that’s your reply, go back and reread all my posts replacing “primary goal” with “actual goal,” because the wording isn’t specific.
Your primary goal is your life satisfaction, and good books are only one way to achieve that; if you think you can get more out of $5k worth of spending in other areas than on books, this lines up with my model.
(I will note, though, that one can view many college classes as the equivalent of “spending $5000 to read a good book.”)
The relevant comparison is more like this one: suppose you preferred a poorly reviewed book with a superior cover to a well reviewed book with an inferior cover. Then we could sensibly conclude that you care more about the cover than the reviews, even if you verbally agree that reviews are more likely to be indicative of quality than the cover.
It doesn’t make any more sense with that. Pretty much nobody would say that because they wouldn’t do Y if X wasn’t true, X is their actual goal for Y. Any term that they would use is such that substituting it in makes your original statement not very insightful.
(For instance, most people wouldn’t call X a primary goal or an actual goal, but they might call X a necessary condition. But if you were to say “people found something other than healing to be a necessary condition for buying a report”, that would not really say much that isn’t already obvious.)
I prefer a poorly reviewed book that costs $10 to a well reviewed book that costs $5000. By your reasoning I “care more about the price than about the reviews”.
That’s fighting the hypothetical.
I think this reveals our fundamental disagreement: I am describing people, not repeating people’s self-descriptions, and since I am claiming that people are systematically mistaken about their self-descriptions, of course there should be a disagreement between them!
That is, suppose Alice “goes to restaurants for the food” but won’t go to any restaurants that have poor decor / ambiance, but will go to restaurants that have good ambiance and poor food. If Bob suggests to Alice that they go to a hole-in-the-wall restaurant with great food, and Alice doesn’t like it or doesn’t go, then an outside observer seems correct in saying that Alice’s actual goal is the ambiance.
Now, sure, Alice could be assessing the experience along many dimensions and summing them in some way. But typically there is a dominant feature that overrides other concerns, or the tradeoffs seem to heavily favor one dimension (perhaps there need to be five units of food quality increase to outweigh one unit of ambiance quality decrease), which cashes out to the same thing when there’s a restricted range.
I think you do care more about the price than about the reviews? That is, if there were a book that cost $5k and there were a bunch of people who had read it and said that the experience of reading it was life-changingly good and totally worth $5k, and you decided not to spend the money on the book, it’s clear that you’re not in the most hardcore set of story-chasers, but instead you’re a budget-conscious story-chaser.
To bring it back to MetaMed, oftentimes the work that they did was definitely worth the cost. People pay hundreds of thousands of dollars for treatment of serious conditions, and so the idea of paying five thousand dollars to get more diagnostic work done to make sure the other money is well-spent is not obviously a strange or bad idea, whereas paying $5k for a novel is outlandish.
I don’t see why you think that. You could argue it’s reference class tennis, but if your point is “people don’t do weird thing X” and in fact people do weird thing X in a slightly different context, then we need to reevaluate what is generating the weirdness. If people do actually spend thousands of dollars in order to read a book (and be credentialed for having read it), then a claim that you don’t want to spend for it becomes a statement about you instead of about people in general, or a statement about what features you find most relevant.
(I don’t know your educational history, but suppose I was having this conversation with an English major who voluntarily took college classes on reading books; clearly the class experience of discussing the book, or the pressure to read the book by Monday, is what they’re after in a deeper way than they were after reading the bookt. If they just cared about reading the book, they would just read the book.)
I’m complaining about your terminology. Terminology is about which meaning your words communicate. Being wrong about one’s self-description is about whether the meaning you intend to communicate by your words is accurate. These are not the same thing and you can easily get one of them wrong independently of the other.
The sentence after the “that is” is a nonstandard definition of “caring more about the price than about the reviews”.
It’s fighting the hypothetical because the hypothetical is that I do not want to pay $5000 for a book. Pointing out that there are situations where people want to pay $5000 for a book disputes whether the situation laid out in the hypothetical actually happens. That’s fighting the hypothetical. Even if you’re correct, whether the situation described in the hypothetical can actually happen is irrelevant to the point the hypothetical is being used to make.
My point is not “people don’t do weird thing X”, my point is that people do not use the term X for the type of situation described in the hypothetical. A situation does not have to actually happen in order for people to use terms to describe it.
Thanks, I’ll try to find the relevant parts.
I didn’t want to get in too in depth into this discussion, because I don’t actually disagree with the weak conclusion that a lot of people receive too much healthcare and that completely free healthcare is probably a bad idea. But Robin Hanson doesn’t stop there, he concludes that the rest of medicine is a sham and the fact that other studies show otherwise is a scandal. As to why I don’t buy this, the RAND experiment does not show that health outcomes do not improve. It shows that certain measured metrics do not show a statistically significant improvement on the whole population. In fact in the original paper, the risk of dying was decreased for the poor high risk group but not the entire population. Which brings up a more general problem—such a study is obviously going to be underpowered for any particular clinical question, and it isn’t capable of detecting benefits that lie outside of those metrics.
MetaMed’s Michael Vassar gave a Tedx talk: The legend of healthcare
I’m not buying your elevator pitch. Primarily because lots of data is not nearly enough. You need smart people and, occasionally, very smart people. This means that
is not true because they lack people smart enough to correctly process the data, interpret it, and arrive at the correct conclusions. And
is also not quite true because companies like McKinsey and Bain actually look for and hire very smart people—again, it’s not just data. Besides, in a lot of cases external consultants are used as hatchet men to do things that are politically impossible for the insiders to do, that is, what matters is not their access to data but their status as outsiders.
Sure there is—money. It’s not “pure” capitalism around here, but it is capitalism.
So, what’s wrong with the stock price as the metric?
Besides, evaluating forecasting capability is… difficult. Both theoretically (out of many possible futures only one gets realized) and practically (there is no incentive for people to give you hard predictions they make).
I don’t think that McKinsey’s and Bain’s business is crunching data. I think it is renting out smart people.
(using throwaway account to post this)
Very true.
I was recently involved in a reasonably huge data mining & business intelligence task (that I probably should not disclose). I could say this was an eye-opener, but I am old enough to be cynical and disillusioned so that it was not a surprise.
First, we had some smart people in the team (shamelessly including myself :-), “smart” almost by definition means “experts in programming, sw development and enough mathematics and statistics) doing the sw implementation, data extraction and statistics. Then there were slightly less smart people, but experts in the domain being studied, that were supposed to make the sense of the results and write the report. These people were offloaded from the team, because they were very urgently needed for other projects.
Second, the company bought very expensive tool for data mining and statistical analysis, and subcontracted other company to extend it with necessary functionality. The tool did not work as expected, the subcontracted extension was late by 2 months (they finished it at the time the final report should have been made!) and it was buggy and did not work with the new version of the tool.
Third, it was quite clear that the report should be bent towards what the customer wants to hear (that is not to say it would contain fabricated data—just the interpretations should be more favourable).
So, those smart people spent their time in 1) implementing around bugs in the sw we were supposed to use, 2) writing ad-hoc statistical analysis sw to be able to do at least something, 3) analysing data in the domain they were not experts in, 4) writing the report.
After all this, the report was stellar, the customer extremely satisfied, the results solid, the reasoning compelling.
Had I not been involved and had I not known how much of the potential had been wasted and on how small fraction of the data the analysis had been performed, I would consider the final report to be a nice example of a clever, honest, top level business intelligence job.
So, this problem is NOT one I’m tackling directly (I’m more saying, how can they get smart people like you to make that cludge for much cheaper) but the model does indirectly incentivize better BI tools by creating competition directly in forecasting ability, and not just signaling ability.
To be frank, I didn’t expect you to based on our previous conversations on forecasting. You are too skeptical of it, and haven’t read some of the recent research on how effective it can be in a variety of situations.
Exactly, this is the problem I’m solving.
As I said, the signaling problem. Using previous performance as a metric means that there are lots of good forecasters out there who simply can’t get discovered—right now, it’s signaling all the way down (Top companies hire from top colleges, take from top highschools). Basically, I’m betting that there are lots of organizations and people out there who are good forecasters, but don’t have the right signals to prove it.
You should read the linked article on prediction polls—they weren’t even paying people in Tetlock’s study (only giving giftcard gifts not at all comensurate to the work people are putting in) and they solved the problem to the point where they could beat prediction markets.
From my internal view I’m sceptical of it because I’m familiar with it :-/
Um, hiring from top colleges is not quite all signaling. There is quite a gap between, say, an average Stanford undergrad and an average undergrad of some small backwater college.
Um, I was one of Tetlocks’ forecasters for a year. I wasn’t terribly impressed, though. I think it’s a bit premature to declare that they “solved the problem”.
With people who claim to have awesome forecasting power or techniques, I tend to point at financial markets and ask why aren’t they filthy rich.
You’re right, I was assuming things about you I shouldn’t have.
Fair point. But the point is that they’re going on something like “the average undergrad” and discounting all the outliers. Especially problematic in this case because forecasting is an orthogonal skillset to what it takes to get into a top college.
Markets are one of the best forecasting tools we have, so beating them is hard. But using the market to get these types of questions answered is hard (liquidity issues in prediction markets) so another technique is needed.
What part specifically of that paper do you think was unimpressive?
Not necessarily. Recall that a slight shift in the mean of a normal distribution (e.g. IQ scores) results in strong domination in the tails.
Besides, searching for talent has costs. You’re much better off searching for talent at top tier schools than at no-name colleges hoping for a hidden gem.
What “types of questions” do you have in mind? And wouldn’t liquidity issues be fixed just by popularity?
Let me propose IQ as a common cause leading to correlation. I don’t think the skillsets are orthogonal.
I read it a while ago and don’t remember enough to do a critique off the top of my head, sorry...
That’s the signalling issue—I’m trying to create a better signal so you don’t have to make that tradeoff
Question Example: “How many units will this product sell in Q1 2016?” (Where this product is something boring, like a brand of toilet paper)
This is a question that I don’t ever see being popular with the general public. If you only have a few experts in a prediction market, you don’t have enough liquidity to update your predictions. With prediction polls, that isn’t a problem.
Why do you call that “signaling”? A top-tier school has a real, actual, territory-level advantage over a backwater college. The undergrads there are different.
I don’t know about that not being a problem. Lack of information is lack of information. Pooling forecasts is not magical.
Because you’re going by the signal (the college name), not the actual thing you’re measuring for (forecasting ability).
I meant a problem for frequent updates. Obviously, less participants will lead to less accurate forecasts—but by brier weighting and extremizing you can still get fairly decent results.
Why do you believe that management consulting companies are payed to predict the future?
I actually believe that management consulting companies are paid to help companies make big decisions. I believe this because usually they are hired when a company needs to make a big decision.
Decision theory shows us that a huge portion of making big decisions is making accurate predictions about the future (and the other pieces, such as determining an accurate utility function, are best left to the organizations themselves).
Where does it show us that’s true?
More importantently how do you know that the customers of mangement consulting believe that’s true? Do you think that the average Fortune 500 CEO invests resources into internal prediction making in a way to indicate that he believes this is true?
I think if the average Fortune 500 CEO would believe this to be true you would have much more internal prediction markets in companies. Programs for internal prediction markets that are not sold based on team building efforts but that are sold on actually producing actionable data.
I mean, I’m convinced by the math. You are welcome to disagree with the math, but you’ll have to show me some other math that disproves everything that decision theorists have already figured out.
We have different models here. In my model, Prediction Markets aren’t used because politics are set up for people who can make excuses—prediction markets would remove the ability of those people to make excuses, so the political factions don’t allow them. Management consulting firms solve this by coming in as an outsider endorsed by the fortune 500CEO (therefore bypassing most of the politics) and making those predictions themselves. I’m just trying to bring down the cost of these outsiders, so that the CEO can use them for many more decisions.
The math depends heavily on the axioms that you use. It’s quite easy to choose axioms in a way that you get the outcome you are looking for. The question is whether those axioms are warrented.
Why can’t the CEO order prediction markets to be created? Do you think the political factions wouldn’t create markets if ordered to do so?
As I said, you’re welcome to show me some axioms that show that forecasting is NOT a huge part of making big decisions.
Because good CEO’s understand that buy-in is essential for any project. You can order projects all day and alienate your workforce, but that’s not how the fortune 500 CEOs got to be fortune 500 CEOs
The general idea is that big decisions get in most contexts made by experts via informed intuition and not by shutting up and calculating. The math at which you are looking is shut up and calculate math.
Do you think people get substantially more alienated if the CEO says: Let’s do an internal prediction market then when he transfers the same power to management consultants? Especially when the consultants are suddenly forced by your system to not make politically acceptable suggestions but focus on true predictions?
There’s substantial room for both in prediction polls.
The alienation doesn’t tank the project because it’s not being run by the people being alienated.
I’m deeply interested in this problem.
I’ve got to ask, though.
Isn’t this a niche filled by ‘business intelligence’ and ‘data science’? They call it a lot of different things, sure, but they seem to be operating in the same space- at least, they may seem to, to a non-technical executive. An exception is mid-to-small business—I don’t think there’s a lot of penetration there.
Theoretically, yes.
In practice, most companies with BI dashboards and data science analytics experience more information overload than before, because they don’t have the human capital to make sense of all that information.
There are limited cases (e.g. weather reporting and website split testing) where the niche is narrow enough that the computer can basically do everything on it’s own, but computers aren’t at the point yet (and likely won’t be for a long time) where they can use generic data to make complex decisions.
GiveWell already uses expert advice for expedient impact assessments. Albeit on a small scale, without using academic- know how and with suboptimal choice and choice architecture of their experts. Hope you can improve on it :)
You’ve picked the wrong problem domain for the scoring rules. Briar comes from probability assessment, there are already more sophisticated approaches to this problem several levels removed from the mathematical theory and synthesising several theoreums.
The most proximate implementations of what you are suggesting are either delphi groups (risk analysis) or prediction markets (rationalist subculture mainly, but also academic). You probably already know how prediction markets work and you can look up ‘expert elicitation’ or ‘eliciting expert judgement’ and similar terms if you’re interested. Happy to answer any tougher questions you can’t get answered.
There are structured approaches to delphi groups which incorporate bayes rules and insights around the psychology of eliciting and structuring expert judgement that you could mimic. There is at least one major corporate consultancy focused on this already, however. AFAIK there are no implementations of this kind in the blockchain. Whether that is a worthwhile competitive advantage is another question.
You have a strategic mindset, I like it. If I’ve interpreted your question accurately, the reason other’s in the know may not have responded is the xy problem.
Yes, the technology I’m using (prediction polls) are essentially this. It’s Delphi groups weighted by Brier scores. The paper I link to above compares them to a prediction market with the same questions—with proper extremizing algorithms, the prediction poll actually does better (especially early on).
The reason I came up with this solution is that I wanted to use prediction markets for a specific class of impact assesments, but they weren’t suited for the task. Prediction markets require either a group of interested suckers to take the bad bets, or a market maker who is sufficiently interested in the outcome to be willing to take the bad side on ALL the sucker bets. My solution complements prediction markets by being much better in those cases by avoiding the zero sum game, and instead just directly paying experts for their expertise.