Eliezer Yudkowsky comments on Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28

Eliezer Yudkowsky 29 Mar 2012 22:39 UTC
17 points
We’ve actually noticed in our weekly sessions that our nice official-looking yes-we’re-gathering-data rate-from-1-to-5 feedback forms don’t seem to correlate with how much people seem to visibly enjoy the session—mostly the ratings seem pretty constant. (We’re still collecting useful data off the verbal comments.) If anyone knows a standard fix for this then PLEASE LET US KNOW.
- AShepard 30 Mar 2012 2:00 UTC
  21 points
  Parent
  I’d suggest measuring the Net Promoter Score (NPS) (link). It’s used in business as a better measure of customer satisfaction than more traditional measures. See here for evidence, sorry for the not-free link.
  1. “On a scale of 0-10, how likely would you be to recommend the minicamp to a friend or colleague?”
  2. “What is the most important reason for your recommendation?
  To interpret, split the responses into 3 groups:
  - 9-10: Promoter—people who will be active advocates.
  - 7-8: Passive—people who are generally positive, but aren’t going to do anything about it.
  - 0-6: Detractor—people who are lukewarm (which will turn others off) or will actively advocate against you
  NPS = [% who are Promoters] - [% who are Detractors]. Good vs. bad NPS varies by context, but +20-30% is generally very good. The followup question is a good way to identify key strengths and high priority areas to improve.
  - Vaniver 30 Mar 2012 3:21 UTC
    15 points
    Parent
    NPS is a really valuable concept. Means and medians are pretty worthless compared to identifying the percentage in each class, and it’s sobering to realize that a 6 is a detractor score.
    
    (Personal anecdote: I went to a movie theater, watched a movie, and near the end, during an intense confrontation between the hero and villain, the film broke. I was patient, but when they sent me an email later asking me the NPS question, I gave it a 6. I mean, it wasn’t that bad. Then two free movie tickets came in the mail, with a plea to try them out again.
    
    I hadn’t realized it, but I had already put that theater in my “never go again” file, since why give them another chance? I then read The Ultimate Question for unrelated reasons, and had that experience in my mind the whole time.)
    - tgb 30 Mar 2012 3:41 UTC
      4 points
      Parent
      Good anecdote. It made me realize that I had just 20 minutes ago made a damning non-recommendation to a friend based off of a single bad experience after a handful of good ones.
  - jsalvatier 30 Mar 2012 18:25 UTC
    7 points
    Parent
    Here is the evidence paper.
  - thomblake 30 Mar 2012 2:02 UTC
    3 points
    Parent
    Right, I’d forgotten about that. I concur that it is used, and I work in market research sort of.
  - AShepard 31 Mar 2012 21:10 UTC
    0 points
    Parent
    Another thing you could do is measure in a more granular way—ask for NPS about particular sessions. You could do this after each session or at the end of each day. This would help you narrow down what sessions are and are not working, and why.
    
    You do have to be careful not to overburden people by asking them for too much detailed feedback too frequently, otherwise they’ll get survey fatigue and the quality of responses will markedly decline. Hence, I would resist the temptation to ask more than 1-2 questions about any particular session. If there are any that are markedly well/poorly received, you can follow up on those later.
- daenerys 29 Mar 2012 22:58 UTC
  9 points
  Parent
  One idea (which you might be doing already) is making the people collecting the data DIFFERENT from the people organizing/running the sessions.
  
  For example, if Bob organizes and runs a session, and everyone likes Bob, but thinks that the session was so-so, they may be less willing to write negative things down if they know Bob is the one collecting and analyzing data.
  
  If Bob runs the sessions, then SALLY should come in at the end and say something like “Well we want to make these better, so I’M gathering information of ways to improve, etc”
  
  Even if Bob eventually gets the negative information, I think people might be more likely to provide it to Sally (one step removed) than to Bob directly.
  
  (Even better: Nameless Guy organizes a session. Bob teaches session (making sure everyone knows this is NAMELESS’ session, and Bob is just the mouthpiece.)
  - daenerys 29 Mar 2012 23:02 UTC
    3 points
    Parent
    Also, I would say that verbal comments are generally MUCH more useful than Likert scale information anyways. It’s better to be getting good comments, and bad Likert scores than vice versa.
- TheOtherDave 29 Mar 2012 22:46 UTC
  8 points
  Parent
  Back when I did training for a living, my experience was that those forms were primarily useful for keeping my boss happy. The one question that was sometimes useful was asking people what they enjoyed most and least about the class, and what they would change about it. Even more useful was asking that question of people to their faces. Most useful was testing to determine what they had actually learned, if anything.
- orthonormal 29 Mar 2012 22:43 UTC
  5 points
  Parent
  I’ve seen “rate from 1 to 5, with 3 excluded”, which should be equivalent to “rate from 1 to 4″ but feels substantially different. But there are probably better ones.
  - Kaj_Sotala 30 Mar 2012 6:31 UTC
    6 points
    Parent
    In this category of tricks, somebody (I forget who) used a rating scale where you assigned a score of 1, 3, or 9. Which should be equivalent to “rate from 1 to 3”, but...
  - Eliezer Yudkowsky 29 Mar 2012 22:44 UTC
    1 point
    Parent
    We weren’t getting a lot of threes, but maybe that works anyway.
    - orthonormal 29 Mar 2012 22:57 UTC
      1 point
      Parent
      Then maybe “1 to 4, excluding 3” or “1 to 5, excluding 4″, to rule out the lazy answer “everything’s basically fine”. That might force people to find an explanation whenever they feel the thing is good but not perfect.
      
      If you start getting 5s too frequently, then it’s probably not a good trick.
      - tgb 30 Mar 2012 3:45 UTC
        5 points
        Parent
        Why not go all the way and just use a plus-minus-zero system like LW ratings (and much of the rest of the internet)? Youtube had an interesting chart before they switched from 5 star rating systems to the like-dislike system showing how useless the star ratings were. But that’s non-mandatory so its very different.
- AShepard 31 Mar 2012 21:11 UTC
  4 points
  Parent
  Another thing you could do is measure in a more granular way—ask for NPS about particular sessions. You could do this after each session or at the end of each day. This would help you narrow down what sessions are and are not working, and why.
  
  You do have to be careful not to overburden people by asking them for too much detailed feedback too frequently, otherwise they’ll get survey fatigue and the quality of responses will markedly decline. Hence, I would resist the temptation to ask more than 1-2 questions about any particular session. If there are any that are markedly well/poorly received, you can follow up on those later.
- John_Maxwell 30 Mar 2012 22:52 UTC
  1 point
  Parent
  You could have a rubric without any numbers, just 10 sentences or so where participants could circle those that apply. E.g. “I learned techniques in this session that I will apply at least once a week in my everyday life”, “Some aspects of this session were kind of boring”, “This session was better presented than a typical college lecture”, etc.
- Alicorn 30 Mar 2012 0:40 UTC
  1 point
  Parent
  You could try a variant of this (give someone a d10 and a d6, hide roll from surveyor, if the d6 comes up 1 they give you a 1-10 rating based on the d10 and are otherwise honest) but this may not be useful in cases where people aren’t deliberately lying to you, and is probably only worth it if you have enough sample size to wipe out random anomalies and can afford to throw out a sixth of your data.
  
  Or weight the die.
- [deleted] 29 Mar 2012 23:14 UTC
  0 points
  Parent
  I’m not a pro, but you probably want to turn the data into a z-score (this class is ranked 3 standard deviations above the ranking for other self-help classes). If you can’t turn it into a z-score, the data is probably meaningless.
  
  Also, maybe use some other ranking system. I imagine that people have a mindless cached procedure for doing these rankings that you might want to interupt to force acually evaluating it (rank is a random variable with mean = 7 and stddev = 1).