A1987dM comments on [QUESTION]: Academic social science and machine learning

A1987dM 20 Jul 2014 20:28 UTC
7 points
You can turn any kind of analysis (which returns a scalar) into a p-value by generating a zillion fake data sets assuming the null hypothesis, analysing them all, and checking for what fraction of the fake data sets your statistic exceeds that for the real data set.
- jsteinhardt 25 Jul 2014 14:57 UTC
  1 point
  Parent
  This doesn’t sound true to me. How do you know the underlying distribution of the null when it’s just something like “these variables are independent”?
  - othercriteria 28 Jul 2014 1:32 UTC
    2 points
    Parent
    If you’re working with composite hypotheses, replace “your statistic” with “the supremum of your statistic over the relevant set of hypotheses”.
    - jsteinhardt 29 Jul 2014 2:04 UTC
      1 point
      Parent
      If there are infinitely many hypotheses in the set then the algorithm in the grandparent doesn’t terminate :).
      - othercriteria 29 Jul 2014 17:02 UTC
        2 points
        Parent
        What I was saying was sort of vague, so I’m going to formalize here.
        
        Data is coming from some random process X(θ,ω), where θ parameterizes the process and ω captures all the randomness. Let’s suppose that for any particular θ, living in the set Θ of parameters where the model is well-defined, it’s easy to sample from X(θ,ω). We don’t put any particular structure (in particular, cardinality assumptions) on Θ. Since we’re being frequentists here, nature’s parameter θ′ is fixed and unknown. We only get to work with the realization of the random process that actually happens, X’ = X(θ′,ω′).
        
        We have some sort of analysis t(⋅) that returns a scalar; applying it to the random data gives us the random variables t(X(θ,ω)), which is still parameterized by θ and still easy to sample from. We pick some null hypothesis Θ0 ⊂ Θ, usually for scientific or convenience reasons.
        
        We want some measure of how weird/surprising the value t(X’) is if θ′ were actually in Θ0. One way to do this, if we have a simple null hypothesis Θ0 = { θ0 }, is to calculate the p-value p(X’) = P(t(X(θ0,ω)) ≥ t(X’)). This can clearly be approximated using samples from t(X(θ0,ω)).
        
        For composite null hypotheses, I guessed that using p(X’) = sup{θ0 ∈ Θ0} P(t(X(θ0,ω)) ≥ t(X’)) would work. Paraphrasing jsteinhardt, if Θ0 = { θ01, …, θ0n }, you could approximate p(X’) using samples from t(X(θ01,ω)), … t(X(θ01,ω)), but it’s not clear what to do when Θ0 has infinite cardinality. I see two ways forward. One is approximating p(X’) by doing the above computation over a finite subset of points in Θ0, chosen by gridding or at random. This should give an approximate lower bound on the p-value, since it might miss θ where the observed data look unexceptional. If the approximate p-value leads you to fail to reject the null, you can believe it; if it leads you to reject the null, you might be less sure and might want to continue trying more points in Θ0. Maybe this is what jsteinhardt means by saying it “doesn’t terminate”? The other way forward might be to use features of t and Θ0, which we do have some control over, to simplify the expression sup{θ0 ∈ Θ0} P(t(X(θ0,ω)) ≥ c). Say, if t(X(θ,ω)) is convex in θ for any ω and Θ0 is a convex bounded polytope living in some Euclidean space, then the supremum only depends on how P(t(X(θ0,ω)) ≥ c) behaves at a finite number of points.
        
        So yeah, things are far more complicated than I claimed and realize now working through it. But you can do sensible things even with a composite null.
        jsteinhardt 30 Jul 2014 5:09 UTC
        0 points
        Parent
        Yup I agree with all of that. Nice explanation!