How do you actually obtain and report a likelihood function for scientific research?
I just read the article https://arbital.com/p/likelihoods_not_pvalues/, but this left me with one big question: what is the actual sequence of actions a researcher would take to report a likelihood function based on their research? For background, I’m an undergraduate student who has participated in multiple undergraduate research positions, and who has so far been uncertain about how to compute and use likelihood functions based on the data (mostly econometric data) involved in this research.
In particular, I want to find out:
How do you compute a likelihood function in Stata, R, Python, etc. based on data you have, for a given hypothesis class
How do you obtain a data file for this likelihood function that can be distributed in a way that other researchers can download and use in e.g. a meta-analysis
What kinds of graphs are good to visually represent these likelihood functions, particularly if the hypothesis class has multiple parameters, and
What kinds of summary statistics are good to numerically represent these likelihood functions in a human-readable way
How has this whole thing historically been done in fields like machine learning that have historically been more amenable to reporting likelihood functions
If you have good answers for any of these questions, I’d really like to know.
i have some reservations about the practicality of reporting likelihood functions and have never done this before, but here are some (sloppy) examples in python. Primarily answering number 1 and 3.
The two figures are:
I think this code will extend to any other likelihood-based model in statsmodels, not just OLS, but I haven’t tested.
It’s also worth familiarizing yourself with how the likelihoods are actually defined. For OLS we assume that residuals are normally distributed. For data points
y_i
atX_i
the likelihood for a linear model with independent, normal residuals is:L=∏ni=1exp(−(yi−Xiβ)/2σ2)/√2πσ2
where β is the parameters of the model, σ2 is the variance of the residuals, and n is the number of datapoints. So the likelihood function here is this value as a function of β (and maybe also σ2, see below).
So if we want to tell someone else our full likelihood function and not just evaluate it at a grid of points, it’s enough to tell them y and X. But that’s the entire dataset! To get a smaller set of summary statistics that capture the entire information, you look for ‘sufficient statistics’. Generally for OLS those are just XTy and XTX. I think that’s also enough to recreate the likelihood function up to a constant?
Note that σ2 matters for reporting the likelihood but doesn’t matter for traditional frequentist approaches like MLE and OLS since it ends up cancelling out when you’re doing finding the maximum or reporting likelihood ratios. This is inconvenient for reporting likelihood functions and I think the code I provided is just using the estimated σ2 from the MLE estimate. However, at the end of the day, someone using your likelihood function would really only be using it to extract likelihood ratios and therefore the σ2 probably doesn’t matter here either?
Off-topic tip: in addition to normal posts, LW also has a “Question” type of post which offers better UX for question-style posts like this one.
Sorry; I thought I had used the “Question” type.
I was absolutely certain I had responded to this, because I had taken the trouble to search for and locate a description of the procedure used in particle physics, which appears to be the central place where likelihood functions are the preferred tool.
Seems I wrote it but never submitted it, so in this here placeholder comment I vouchsafe to hunt that resource down again and put it here in an edit.
Edit: As I promised, the resource: https://ep-news.web.cern.ch/what-likelihood-function-and-how-it-used-particle-physics
This is a short article from by a person from CERN, Robert Cousins. It covers in brief what likelihood is and how it is different than probability, then a short description of three different methods of using a likelihood function (here listed as Likelihoodist, Neman-Pearson, and Bayesian), and then on to a slightly more advanced example. It has references which include some papers from the work on identifying the Higgs Boson, and some of his own relevant papers.