Peter_de_Blanc comments on Open Thread: March 2010

Peter_de_Blanc Mar 7, 2010, 9:57 PM
1 point
OK, I’ll use the same model I use for text. The zeroth-order model is maxentropy, and the kth-order model is a k-gram model with a pseudocount of 2 (the alphabet size) allocated to the (k-1)th-order model.

In this case, since there’s never before been a Thursday in which she did not call, we default to the 1st-order model, which says the probability is ³⁄₄ that she will come on Friday.
- [deleted]Mar 8, 2010, 1:13 AM
  3 points
  Parent
  I beg your pardon?
- Douglas_Knight Sep 22, 2010, 1:56 AM
  0 points
  Parent
  
  OK, I’ll use the same model I use for text. The zeroth-order model is maxentropy, and the kth-order model is a k-gram model with a pseudocount of 2 (the alphabet size) allocated to the (k-1)th-order model.
  
  Is this a standard model? Does it have a name? a reference?
  I see that the level 1 model is Laplace’s rule of succession. Is there some clean statement about the level k model? Is this a bayesian update?
  
  In this case, since there’s never before been a Thursday in which she did not call, we default to the 1st-order model, which says the probability is ³⁄₄ that she will come on Friday.
  
  You seem to be treating the string as being labeled by alternating Thursdays and Fridays, which have letters drawn from different alphabets. The model easily extends to this, but it was probably worth saying, particularly since the two alphabets happen to have the same size.
  
  I find it odd that almost everyone treated weeks as discrete events. In this problem, days seem like the much more natural unit to me. ata probably agrees with me, but he didn’t reach a conclusion. With weeks, we have very few observations, so a lot depends on our model, like whether we use alphabets of size 2 for Thursday and Friday (Peter), or whether we use alphabets of size 4 for the whole week (wnoise). I’m going to allow calls and visits on each day and use an alphabet of size 4 for each day. I think it would be better to use a Peter-ish system of separating morning visits from evening calls, but with data indexed by days, we have a lot of data, so I don’t think this matters so much.
  
  I’ll run my weeks Sun-Sat. Weeks 1 and 2 are complete and week 3 is partial. Treating days as independent and having 4 outcomes: ([no]visit)x([no]call). I interpret the unspecified days as having no call and no visit. Using Laplace’s rule of succession, we have ⁴⁄₂₃ chance of visit, which sounds pretty reasonable to me. But if we use Peter’s hierarchical model, I think our chance of a visit is 4/23*4/17*4/14*4/11*4/8*4/5 = ¹⁄₅₀₀. That is, since we’ve never seen a visit after a no-call/no-visit day, the only way to get a visit is from level 1 of the model, so we multiply the chance of falling through from level 2 to level 1, from level 3 to 2, etc. The chance of falling through from level n+1 to level n is 4/(4+c), where c is the number of times we’ve seen an n+1-gram that continues the last n days. So for n=5, the last 5 days were no-visit-no-call, which we’ve seen once before, culminating in the no-visit-call Thursday of the second week. So that’s our factor of ⁴⁄₅. For n=4, we’ve seen the resolution of 4 consecutive days of no-visit-no-call, once in the first week, twice in the second week, and once in the third week; so that’s the ⁴⁄₈.
  
  ¹⁄₅₀₀ seems awfully small to me. Am I using this model correctly? I like level 2, 4/23*4/17=4%, but maybe I’m implicitly getting “2” from a prior that the call is connected to the visit.
  
  With a Peter’s two alphabets, each of size two, level 1 yields ³⁄₂₁, level 2 3/21*2/18=2%, and the full model 3/21*2/18*2/16*2/15*2/13*2/12*2/10*2/9*2/7*2/6*2/4*2/4 = 10^-8. Levels 1 and 2 were a little smaller than with the size 4 alphabet, but the full model much smaller. I was expecting the probability of a visit to be about squared, but it was cubed.