abcdef comments on Open thread, September 25 - October 1, 2017

abcdef 28 Sep 2017 12:21 UTC
0 points
I’m not a statistician, but I happen to have some intuitions and sometimes work out formulas or find them on the web.

I have a bunch of students that took a test each day. The test of each day had a threshold score out of, say, 100 points. Scores under the threshold are considered insufficient.

I don’t know whether of the two is true:
1. I can either use the tests to evaluate the students, or the students to evaluate the tests.
2. I can evaluate the students using the tests and the tests using the students at the same time.
The option 2. seems counterintuitive at first sight, especially if one wants to be epistemically sound. It seems more intuitive at second sight, though. I think it might be analogous to how you can evaluate a circular flow of feedback by using linear algebra (cfr. LW 2.0 discussions).

Some other context: In my evaluation model I would rather not only consider whether the scores were sufficient or not, but consider how much they were sufficient or insufficient, possibly after opportunely transforming them. Also, I want the weights of the scores to decay exponentially. I would also rather use a bayesian approach.

Is this reasonable, and where can I find instructions on how to do so?
- IlyaShpitser 28 Sep 2017 14:08 UTC
  3 points
  Parent
  You have an experimental design problem: https://en.wikipedia.org/wiki/Design_of_experiments.
  
  The way that formalism would think about your problem is you have two “treatments” (type of test, that you can vary, and type of student), and an “outcome” (how a given student does on a given test, typically some sort of histogram that’s hopefully shaped like a bell).
  
  Your goal is to efficiently vary “treatment” values to learn as much as possible about the causal relationship between how you structure a test, and student quality, and the outcome.
  
  There’s reading you can do on this problem, it’s a classical problem in statistics. Both Jerzy Neyman and Ronald Fisher wrote a lot about this, the latter has a famous book.
  
  In fact, in some sense this is the problem of statistics, in the sense that modern statistics could be said to have grown out of, and generalized from, this problem.
  - abcdef 29 Sep 2017 12:40 UTC
    0 points
    Parent
    In your opinion what is a reasonable price to have a statistician write me a formula for this?
    - username2 29 Sep 2017 15:40 UTC
      0 points
      Parent
      i do statistical consulting as part of my day job responsibilities, i’m afraid to say this is not how it works.
      
      if you came to me with this question i would roll back to ask what exactly you are trying to achieve with the analyses, before getting into the additional constraints you want to include. unfortunately it’s far more challenging if the data owner comes to the statistician after the data are collected rather than before (when principles of experimental design as ilya mentioned can be considered to achieve ability to successfully answer those questions using statistical methods).
      
      that said, temporarily ignoring the additional constraints you mentioned (e.g. whether and how to transform data; exponential decay and what that actually means with respect to student evaluation scores; magic word “bayes”) perhaps a useful search term would be “item response theory”.
      
      good luck
    - IlyaShpitser 29 Sep 2017 14:37 UTC
      0 points
      Parent
      Don’t know. Ask a statistician who knows about design.
- MrMind 29 Sep 2017 9:35 UTC
  0 points
  Parent
  From a Bayesian perspective, you calculate P(S|T) and P(T|S) at the same time, so it doesn’t really matter. What does matter, and greatly, are your starting assumptions and models: if you have only one for each entity, you won’t be able to calculate how much some datum is evidence of your model or not.
  - abcdef 29 Sep 2017 12:41 UTC
    0 points
    Parent
    Sorry I don’t follow. What do you mean by starting assumptions and models that I should have more than one for each entity?
    - MrMind 29 Sep 2017 15:34 UTC
      0 points
      Parent
      Well, to calculate P(T|S) = p you need a model of how a student ‘works’, in such a way that the test’s result T happens for the kind of students S with probability p. Or you can calculate P(S|T), thereby having a model of how a test ‘works’ by producing the kind of student S with probability p.
      If you have only one of those, these are the only things you can calculate.
      
      If on the other hand you have one or more complementary models (complemenetary here means that they exclude each other and form a complete set), then you can calculate the probabilities P(T1|S1), P(T1|S2), P(T2|S1) and P(T2|S2). With these numbers, via Bayes, you have both P(T|S) and P(S|T), so it’s up to you to decide if you’re analyzing stundents or tests.
      Usually one is more natural than the other, but it’s up to you, since they’re models anyway.