So I was mostly averaging percentages across the years instead of counting, which isn’t great; knowing that it’s 30⁄65 makes me much more on board with “oh yeah there’s no signal there.”
But I think your comparison between hypotheses seems wrong; like, presumably it should be closer to a BIC-style test, where you decide if it’s worth storing the extra parameter p?
The Bayes factor calculation which I did is the analytical result for which BIC is an approximation (see this sequence). Generally BIC is a large N approximation but in this case they actually do end up being fairly similar even with low N.
So I was mostly averaging percentages across the years instead of counting, which isn’t great; knowing that it’s 30⁄65 makes me much more on board with “oh yeah there’s no signal there.”
But I think your comparison between hypotheses seems wrong; like, presumably it should be closer to a BIC-style test, where you decide if it’s worth storing the extra parameter p?
The Bayes factor calculation which I did is the analytical result for which BIC is an approximation (see this sequence). Generally BIC is a large N approximation but in this case they actually do end up being fairly similar even with low N.