I’m not a statistician, but I happen to have some intuitions and sometimes work out formulas or find them on the web.
I have a bunch of students that took a test each day. The test of each day had a threshold score out of, say, 100 points. Scores under the threshold are considered insufficient.
I don’t know whether of the two is true:
I can either use the tests to evaluate the students, or the students to evaluate the tests.
I can evaluate the students using the tests and the tests using the students at the same time.
The option 2. seems counterintuitive at first sight, especially if one wants to be epistemically sound. It seems more intuitive at second sight, though. I think it might be analogous to how you can evaluate a circular flow of feedback by using linear algebra (cfr. LW 2.0 discussions).
Some other context: In my evaluation model I would rather not only consider whether the scores were sufficient or not, but consider how much they were sufficient or insufficient, possibly after opportunely transforming them. Also, I want the weights of the scores to decay exponentially. I would also rather use a bayesian approach.
Is this reasonable, and where can I find instructions on how to do so?
The way that formalism would think about your problem is you have two “treatments” (type of test, that you can vary, and type of student), and an “outcome” (how a given student does on a given test, typically some sort of histogram that’s hopefully shaped like a bell).
Your goal is to efficiently vary “treatment” values to learn as much as possible about the causal relationship between how you structure a test, and student quality, and the outcome.
There’s reading you can do on this problem, it’s a classical problem in statistics. Both Jerzy Neyman and Ronald Fisher wrote a lot about this, the latter has a famous book.
In fact, in some sense this is the problem of statistics, in the sense that modern statistics could be said to have grown out of, and generalized from, this problem.
i do statistical consulting as part of my day job responsibilities, i’m afraid to say this is not how it works.
if you came to me with this question i would roll back to ask what exactly you are trying to achieve with the analyses, before getting into the additional constraints you want to include. unfortunately it’s far more challenging if the data owner comes to the statistician after the data are collected rather than before (when principles of experimental design as ilya mentioned can be considered to achieve ability to successfully answer those questions using statistical methods).
that said, temporarily ignoring the additional constraints you mentioned (e.g. whether and how to transform data; exponential decay and what that actually means with respect to student evaluation scores; magic word “bayes”) perhaps a useful search term would be “item response theory”.
From a Bayesian perspective, you calculate P(S|T) and P(T|S) at the same time, so it doesn’t really matter. What does matter, and greatly, are your starting assumptions and models: if you have only one for each entity, you won’t be able to calculate how much some datum is evidence of your model or not.
Well, to calculate P(T|S) = p you need a model of how a student ‘works’, in such a way that the test’s result T happens for the kind of students S with probability p. Or you can calculate P(S|T), thereby having a model of how a test ‘works’ by producing the kind of student S with probability p. If you have only one of those, these are the only things you can calculate.
If on the other hand you have one or more complementary models (complemenetary here means that they exclude each other and form a complete set), then you can calculate the probabilities P(T1|S1), P(T1|S2), P(T2|S1) and P(T2|S2). With these numbers, via Bayes, you have both P(T|S) and P(S|T), so it’s up to you to decide if you’re analyzing stundents or tests. Usually one is more natural than the other, but it’s up to you, since they’re models anyway.
I’m not a statistician, but I happen to have some intuitions and sometimes work out formulas or find them on the web.
I have a bunch of students that took a test each day. The test of each day had a threshold score out of, say, 100 points. Scores under the threshold are considered insufficient.
I don’t know whether of the two is true:
I can either use the tests to evaluate the students, or the students to evaluate the tests.
I can evaluate the students using the tests and the tests using the students at the same time.
The option 2. seems counterintuitive at first sight, especially if one wants to be epistemically sound. It seems more intuitive at second sight, though. I think it might be analogous to how you can evaluate a circular flow of feedback by using linear algebra (cfr. LW 2.0 discussions).
Some other context: In my evaluation model I would rather not only consider whether the scores were sufficient or not, but consider how much they were sufficient or insufficient, possibly after opportunely transforming them. Also, I want the weights of the scores to decay exponentially. I would also rather use a bayesian approach.
Is this reasonable, and where can I find instructions on how to do so?
You have an experimental design problem: https://en.wikipedia.org/wiki/Design_of_experiments.
The way that formalism would think about your problem is you have two “treatments” (type of test, that you can vary, and type of student), and an “outcome” (how a given student does on a given test, typically some sort of histogram that’s hopefully shaped like a bell).
Your goal is to efficiently vary “treatment” values to learn as much as possible about the causal relationship between how you structure a test, and student quality, and the outcome.
There’s reading you can do on this problem, it’s a classical problem in statistics. Both Jerzy Neyman and Ronald Fisher wrote a lot about this, the latter has a famous book.
In fact, in some sense this is the problem of statistics, in the sense that modern statistics could be said to have grown out of, and generalized from, this problem.
In your opinion what is a reasonable price to have a statistician write me a formula for this?
i do statistical consulting as part of my day job responsibilities, i’m afraid to say this is not how it works.
if you came to me with this question i would roll back to ask what exactly you are trying to achieve with the analyses, before getting into the additional constraints you want to include. unfortunately it’s far more challenging if the data owner comes to the statistician after the data are collected rather than before (when principles of experimental design as ilya mentioned can be considered to achieve ability to successfully answer those questions using statistical methods).
that said, temporarily ignoring the additional constraints you mentioned (e.g. whether and how to transform data; exponential decay and what that actually means with respect to student evaluation scores; magic word “bayes”) perhaps a useful search term would be “item response theory”.
good luck
Don’t know. Ask a statistician who knows about design.
From a Bayesian perspective, you calculate P(S|T) and P(T|S) at the same time, so it doesn’t really matter. What does matter, and greatly, are your starting assumptions and models: if you have only one for each entity, you won’t be able to calculate how much some datum is evidence of your model or not.
Sorry I don’t follow. What do you mean by starting assumptions and models that I should have more than one for each entity?
Well, to calculate P(T|S) = p you need a model of how a student ‘works’, in such a way that the test’s result T happens for the kind of students S with probability p. Or you can calculate P(S|T), thereby having a model of how a test ‘works’ by producing the kind of student S with probability p.
If you have only one of those, these are the only things you can calculate.
If on the other hand you have one or more complementary models (complemenetary here means that they exclude each other and form a complete set), then you can calculate the probabilities P(T1|S1), P(T1|S2), P(T2|S1) and P(T2|S2). With these numbers, via Bayes, you have both P(T|S) and P(S|T), so it’s up to you to decide if you’re analyzing stundents or tests.
Usually one is more natural than the other, but it’s up to you, since they’re models anyway.