Request for suggestions: ageing and data-mining
Imagine you had the following at your disposal:
A Ph.D. in a biological science, with a fair amount of reading and wet-lab work under your belt on the topic of aging and longevity (but in hindsight, nothing that turned out to leverage any real mechanistic insights into aging).
A M.S. in statistics. Sadly, the non-Bayesian kind for the most part, but along the way acquired the meta-skills necessary to read and understand most quantitative papers with life-science applications.
Love of programming and data, the ability to learn most new computer languages in a couple of weeks, and at least 8 years spent hacking R code.
Research access to large amounts of anonymized patient data.
Optimistically, two decades remaining in which to make it all count.
Imagine that your goal were to slow or prevent biological aging...
What would be the specific questions you would try to tackle first?
What additional skills would you add to your toolkit?
How would you allocate your limited time between the research questions in #1 and the acquisition of new skills in #2?
Thanks for your input.
Update
I thank everyone for their input and apologize for how long it has taken me to post an update.
I met with Aubrey de Grey and he recommended using the anonymized patient data to look for novel uses for already-prescribed drugs. He also suggested I do a comparison of existing longitudinal studies (e.g. Framingham) and the equivalent data elements from our data warehouse. I asked him that if he runs into any researchers with promising theories or methods but for a massive human dataset to test them on, to send them my way.
My original question was a bit to broad in retrospect: I should have focused more on how to best leverage the capabilities my project already has in place rather than a more general “what should I do with myself” kind of appeal. On the other hand, at the time I might have been less confident about the project’s success than I am now. Though the conversation immediately went off into prospective experiments rather than analyzing existing data, there were some great ideas there that may yet become practical to implement.
At any rate, a lot of this has been overcome by events. In the last six months I realized that before we even get to the bifurcation point between longevity and other research areas, there are a crapload of technical, logistical, and organizational problems to solve. I no longer have any doubt that these real problems are worth solving, my team is well positioned to solve many of them, and the solutions will significantly accelerate research in many areas including longevity. We have institutional support, we have a credible revenue stream, and no shortage of promising directions to pursue. The limiting factor now is people-hours. So, we are recruiting.
Thanks again to everyone for their feedback.
To answer your questions:
SENS has a page that might help answer the first question you posed above.
You could email Audbrey de Grey and ask for ideas. (The page I have linked above seems to suggest that he is highly open to receiving emails from intelligent people who are interested in doing anti-aging research, so don’t let the fact that he’s internet-famous prevent you from sending him a note).
In response to 2, I would say that it seems like you are already highly skilled, such that you could dive in and tackle any problem(s) you decide to start working on immediately. People gain skills by working on hard problems, so it doesn’t seem necessary for you to take additional time to explicitly hone your skill set before starting on any project(s) that you want to work on.
Thanks for reminding me about SENS and de Grey, I should email him. I should reach out to all the smart people in the research community I know well enough to randomly pester and collect their opinions on this.
Please do. And tell us the results.
Regarding their replies: I wonder whether you should rather take recommendations appearing most frequently or whether it is a better idea to take the most highly rated single recommendation.
The embarrassing truth is I spent so much time cramming stuff into my brain while trying to survive in academia that until now I haven’t really had time to think about the big picture. I just vectored toward what at any given point seemed like the direction that would give me the most options for tackling the aging problem. Now I’m finally as close to an optimal starting point as I can reasonably expect and the time has come to confront the question: “now what”?
I completely understand and sympathize with that feeling. I am about to graduate with an undergraduate degree in chemistry, and it was not until earlier this semester that I began to realize that I still don’t know what type of career path I want to pursue after doing graduate work in operations research, given that I am somewhat more inclined to go to graduate school than I am to go directly into industry.
Calico is hiring.
Might be worth looking into.
You’re probably in a better position to even ask the questions than most people here, but is telomere reduction still a promising area of aging research? A very smart geneticist in my family has always insisted that there’s some kind of telomerase wonder drug that we haven’t invented yet.
Lots of people still think that there is promise in telomerase therapy, yes. Eg, look up Michael Fossel.
Please do let us know what you ultimately decided, and why.
“A M.S. in statistics. Sadly, the non-Bayesian kind for the most part”
I’d hardly be ashamed of having a ‘non-Bayesian’ statistics degree. Bayes is referenced a lot in LW, and for good reason but Bayes theorem is not all that difficult to understand particularly for someone with your education. The most useful skill a knowledge of statistics can give you, arguably, is being able to objectively analyse and comprehend extremely large amounts of data.
Have you looked into the possibility of acquiring a research partner? It may be a more effective use of your time to predominantly take care of the statistical analysis and the biological experimentation while your partner (endowed with skills you don’t have time to learn yourself) can present fresh ideas for new research. This method would be prone to less bias and if it’s a race against time, you may not have enough to acquire an entirely new skill set.
The point isn’t understanding Bayes theorem. The point is methods that use Bayes theorem. My own statistics prof said that a lot of medical people don’t use Bayes because it usually leads to more complicated math.
That’s not the skill that’s taught in a statistics degree. Dealing with large amounts of biological data needs algorithms and quite often Bayes somewhere.
To me, the biggest problem with Bayes theorem or any other fundamental statistical concept, frequentist or not, is adapting it to specific, complex, real-life problems and finding ways to test its validity under real-world constraints. This tends to require a thorough understanding of both statistics and the problem domain.
Not explicitly, no. My only evidence is anecdotal. The statisticians and programmers I’ve talked to appear to overall be more rigorous in their thinking than biologists. Or at least better able to rigorously articulate their ideas (the Achilles heel of statisticians and programmers is that they systematically underestimate the complexity of biological systems, but that’s a different topic). I found that my own thinking became more organized and thorough over the course of my statistical training.
“My own statistics prof said...”
I am sure we sure we are more than capable of looking beyond the scope of what your statistics professor had time to teach you at university. I have some knowledge and education of statistics myself, not that it makes me particularly more entitled to comment about it.
“Thats not the skill that’s taught in a statistics degree.”
I commend you for apparently having a statistics degree of some form. To suggest that analysing and comprehending large amounts of data isnt taught in a statistics degree makes me question your statistics degree. I’m not saying your degree is any better worse, perhaps just unique. Of course, comprehending large amounts statistical data would lead to the use of algorithms to accurately explain the data. We rely on algorithms and mathematics for statistical analysis. Understanding the ‘complicated’ maths or Bayes theorem wouldnt seem like that great a stretch given the OP’s education which is my initial point.
I have studied bioinformatics and as such I have a particular idea of the domain of medical statistics and the domain of bioinformatics.
Big data often means that testing for 5% significance is a bad idea. As a result people working on big biological data weren’t very welcome by the frequentists in medical statistics and bioinformatics formed it’s own community.
That community split produces effects such as bioinformatics having it’s own server for R packages and not using the server in which the statistics folks put their R packages.
In another post in this thread bokov speaks of wanting to use Hidden Markov Models (HMM) for modeling. HMM is the classic thing that based on Bayes rule and that people in bioinformatics use a lot but that’s not really taught in statistics.
Understanding Bayes theorem is not hard. Bayes theorem itself is trivial to learn. Understanding some complex algorithm for determining Hidden Markov Models based on Bayes rule is the harder part.
Machine Learning is also a different community then standard statistics. It’s also not only about Bayes theorem. There are machine learning algorithms that don’t use Bayes. Those algorithms are still different than what people usually do in statistics.
Take all the data you have, come up with some theory to describe it, build the scheme into a lossless data compressor, and invoke it on the data set. Write down the compression rate you achieve, and then try to do better. And better. And better. This goal will force you to systematically improve your understanding of the data.
(Note that transforming a sufficiently well specified statistical model into a lossless data compressor is a solved problem, and the solution is called arithmetic encoding—I can give you my implementation, or you can find one on the web. So what I’m really suggesting is just that you build statistical models of the raw data, and try systematically to improve those models).
Would anyone want to literally do this on something as complex as patient data?
If not, why not just say try to come up with as good of models as you can?
Pick a couple of quantities of interest and try to model them as accurately as you can.
There is a problem that some data may really fundamentally be a distraction, and so modeling it is just a waste of time.
But it is very hard to tell ahead of time whether or not a piece of data is going to be relevant to a downstream analysis. As an example, in my work on text analysis, the issue of capitalization takes a lot of effort in proportion to how interesting it seems. It is tempting to just throw away caps information by lowercasing everything. But capitalization actually has clues that are relevant to parsing and other analysis—in particular, it allows you to identify acronyms, which usually stand for proper nouns.
This positively sounds a lot like advice that was given in response to a question in the open thread about how to go about a masters thesis. I can’t find it but I endorse the recommendation. Immerse yourself in the data. Attack it from different angles and try to compress it down as much as possible. The idea behind the advice is that if you understand the mechanics behind the process the data can be generated from the process (imagine an image of a circle encoded as svg instead of bitmap (or png)).
There are two caveats: 1) You can’t eliminate noise of course. 2) You are limited by your data set(s). For the former you know enough tools to separate the noise from the data and quantify it.For the latter you should join in extenal data sets. Your modelling might suggest which could improve your compression. E.g. try to link in SNPs databases.
The unsolved problems are the ones hiding behind the token “sufficiently well specified statistical model”.
That said, thanks for the pointer to arithmetic encoding, that may be useful in the future.
Maybe send your CV to the Buck institute or associated researchers to see if you could collaborate?
In general I think that gene expression data will teach us a lot, and our knowledge of it is likely to increase as sequencing costs rise. There are some interesting and important statistical topics here, such as deconvoluting cell types in heterogenous tissue samples to focus on signaling pathways.
Also see the GRG mailing list.
Also you can PM or email me if you want to chat about ideas.
Thanks for mentioning GRG and the Buck institute.
1) I visited the Buck institute at their recent open house and found out that they have an annual budget that is 10x SENS’s, and both institutions seem to have the same goal. Any idea why SENS is more talked about round these parts?
2) The person who runs the GRG mailing list, Johnny, is really friendly and linked me to many more longevity related events (e.g. Bay Area Aging Symposium, Health Extension meetups) and newsletters (e.g. The Longevity Reporter and Fight Aging!)
Happy I was able to spread useful information!
That SENS is mentioned more than the Buck institute is likely because they have a highly charismatic leader who’s written a popular book. The Buck institute is slightly more conventional and also gets NIH funding. Also the Buck institute and SENS collaborate, for example on undergrad research opportunities. I’ve donated to SENS myself and I obviously think they do useful work too.
Good to know re: friendliness of the person who runs the GRG mailing list.
I should clarify something: the types of problems I can most efficiently tackle are retrospective analysis of already-collected data.
Prospective clinical and animal studies are not out of the question, but given the investment in infrastructure and regulatory compliance they would need, these would have to be collaborations with researchers already pursuing such studies. This is on the table, but does not leverage the clinical data I already have (unless, in the case of clinical researchers, they are already at my institution or an affiliated one).
My idea at the moment is to fit a hidden Markov model and derive a state model for human aging. But this pile of clinical data I have has got to be useful for all kinds of other aging-related questions...
I have some philosophical objections to your approach. I’m not sure it’s such a good idea to focus exclusively on research questions that are explicitly aging-related, just because you’ll be limiting yourself to a subset of the promising ideas out there. Secondly, you probably shouldn’t worry about pursuing a project in which your already-collected data is useless, especially if that data or similar is also available to most other researchers in your field (if not, it would be very useful for you to try to make that data available to others who could do something with it). You’re probably more likely to make progress with interesting new data than interesting old data. Also, I’m not sure if this is your intention, but it seems to me that the goal of spending 20 years to slow or prevent aging is a recipe for wasting time. It’s such an ambitious goal that so many people are already working on, any one researcher is unlikely to put a measurable dent in it. It’s like getting a math phd and saying “Ok, now I’m going to spend the rest of my life trying to help solve the Riemann Hypothesis.” Esepcially when you’re just starting out, you may be better-served working on the most promising projects you can find in your general area of interest, even if their goals are less ambitious.
P.s. Sorry if a lot of what I’ve said is naive, I’ve never worked in academia.
In the last five years the NIH (National Institutes of Health) has never spent more than 2% of its budget on aging research. To a first approximation, the availability of grant support is proportional to the number of academic researchers, or at least to the amount of academic research effort being put into a problem. This is evidence against aging already getting enough attention. Especially considering that age is a major risk factor for just about every disease. It’s as if we tried to treat AIDS by spending 2% on HIV research and 98% on all the hundreds of opportunistic infections that are the proximal causes any individual AIDS patient’s death. I would think that curing several hundred proximal problems is more ambitious than trying to understand and intervene in a few underlying causes.
I have no illusions of single-handedly curing aging in the next two decades. I will be as satisfied as any other stiff in the cryofacility if I manage remove one or more major road-blocks to a practical anti-aging intervention or at least a well-defined and valid mechanistic model of aging.
This is ‘new’ data in the sense that it is only now becoming available for research purposes, and if I have my way, it is going to be in a very flexible and analysis-friendly format. It is the core mission of my team to make the data available to researchers (insofar as permitted by law, patients’ right to privacy, and contractual obligations to the owners of the data).
If I ran “academia”, tool and method development would take at least as much priority as traditional hypothesis-driven research. I think a major take-home message of LW is that hypotheses are a dime a dozen—what we need are practical ways to rank them and update their rankings on new data. A good tool that lets you crank through thousands of hypotheses is worth a lot more than any individual hypothesis. I have all kinds of fun ideas for tools.
But for the purposes of this post, I’m assuming that I’m stuck with the academia we have, I have access to a large anonymized clinical dataset, and I want to make the best possible use of it (I’ll address your points about aging as a choice of topic in a separate reply).
The academia we’re stuck with (at least in the biomedical field) effectively requires faculty to have a research plan describable by “Determine whether FOO is true or false” rather than “Create a FOO that does BAR”.
So the nobrainer approach would be for me to take the tool I most want to develop, slap some age-related disease onto it as a motivating use-case, and make that my grant. But, this optimizes for the wrong thing—I don’t want to find excuses for engaging in fascinating intellectual exercises. I want to find the problems with the greatest potential to advance human longevity, and then bring my assets to bear on those problems even if the work turns out to be uglier and more tedious than my ideal informatics project.
The reason I’m asking for the LW community’s perspective on what’s on the critical path to human longevity is that I spent too much time around excuse-driven^H^H^H hypothesis-driven research to put too much faith in my own intuitions about what problems need to be solved.
I wasn’t arguing whether aging research should receive more attention, just that it receives enough to make a single researcher a drop in the bucket, but you might not be an average researcher. I’m interested in knowing, how likely do you think it is that the life expectancy of some people will be measurably lower if you work as a used-car salesman for the next 20 years rather than a researcher. I’m not suggesting that aging isn’t a worthwhile area of research, just that it may be counterproductive for you to be trying to make all the work you do for the next 20 years have some direct bearing on aging.
When I say a project is ambitious, I mean that it is very unlikely to return good results, but that the impact of those good results would be enormous. Developing a large number of drugs to increase the life expectancies of terminally ill cancer patients is less ambitious than trying to cure their cancer. You seem to be thinking that we have made so little progress on aging because it hasn’t received enough attention. What if it’s the other way around, and so few researchers tackle aging head-on because it’s hard to make meaningful progress on? I think that for any researcher who wants to provide mechanistic insights into aging, or figure out how the brain works, or create a machine with human-like general intelligence, there’s a lrage incentive for success, but almost inevitably such researchers need shorter term results to keep themselves going. If there simply aren’t any shorter term opportunities to make meaningful progress on, they run the risk of working on something that seems related to the problem they set out to solve, but in reality contributes only shallowly to their understanding of it. This is how you end up with so many attempts to better understand the brain through brain scans or make progress in machine intelligence by studying an absurdly specific situation. There were probably more meaningful things those researchers could have been doing that didn’t seem to fall under the heading of an extremely ambitious goal. You might be able to bypass these tendencies, but it won’t be easy; if it were easy, we would have more researchers who are making meaningful progress on aging.
Okay, neat! I have an idea, and it might be kind of farfetched, or not amenable to the types of analyses you are best at doing, but I’ll share it anyways. Here goes.
Given that there is a tradeoff between health and reproduction, I wonder if you could increase the expected lifespan of a healthy human male by having him take anti-androgens on a regular basis.
We already know that male eunuchs who are humans live longer than intact male humans. I suspect that most guys wouldn’t be willing to become eunuchs even if they valued having a long lifespan very highly, but being able to increase one’s expected lifespan by decreasing one’s testosterone levels while still remaining intact might be something that a few males would consider, if such a therapy were proven to be effective.
Anyways, after taking 10 minutes to look around on Google Scholar, I wasn’t able to find any papers suggesting that taking anti-androgens would be an effective anti-aging measure, so maybe this would be a viable project for someone to work on.
As an aside, I don’t know which mechanisms cause castrated men to live longer, but this seems relevant to the question of why/how castrated men live longer.
Great idea! Here’s how I can convert your prospective experiment into retrospective ones:
Comparing hazard functions for individuals with diagnoses of infertility versus individuals who originally enter the clinic record system due to a routine checkup.
This is interesting, but a clear confound is that people who enter for infertility are likely to be more conscientious, which correlates with lifespan.
Whether male eunuchs actually live longer is controversial to say the least. Eg, the effect is not seen in dogs. In humans there are clear confounds.
Also, t levels don’t seem to clearly correlate with decreased or increased lifespans. And as your last link points out, lower levels of t (ie hypogonadism) are correlated with increased risk of CVD mortality.
Yes, you’re right about that. The paper says that:
However, the paper also says that:
In fact, since there is a tradeoff between health and reproductive ability, we might expect the development of health problems in previously healthy males to cause testosterone levels to drop, as a means of offsetting some of the negative effects of said health problem. This could account for why lower levels of testosterone are correlated with increased CVD mortality.
However,
is a statement which I emphatically disagree with.
In my view there is reasonable evidence for a trade-off between health and reproduction between species, but not within species. Am I wrong on this?
On eunuch lifespan, you are basically relying on three studies, each of which are historical, ie the Mental Health studies in the mid 20th century and the historical Korean eunuch study. I think there are big problems in interpreting these studies. For example, it’s not like the eunuch lifespans in either sample is as long as men in wealthy countries, which makes things like infections and generally risky behavior a much stronger candidate for the mechanism, which wouldn’t generalize to lifespan today. What am I wrong about here?
Again, why don’t we see the effect in dogs? http://www.straightdope.com/columns/read/3068/does-castration-longer-life
Let me be clear that I want you to be right. It suggests a clear mechanism to increasing lifespan in men. I just don’t think that there’s very strong evidence for it.
I would expect gene therapy to be the most likely field to have a major breakthrough in this area. The cost of genome sequencing has been falling rapidly. Although finding associations between genetic disorders and aging might be useless today, they could become quite critical in 10 years. That’s where I would focus your search.
Quite possibly, see the work of Maria Konovalenko in this area. However, this is not an easy path and is far, far more dependent on technology to safely introduce genes to cells within tissues in the body than knowledge of which genes to target. Consider that there are many diseases for which the causative genes have been known for some time, yet this has not yet been used successfully for almost all of them.
That technology pretty much exists already, its just extremely under-advertised for various reasons.
I highly recommend you look into the work of Joao Pedro de Magalhaes, who is doing diverse and interesting work in aging bioinformatics and aging science in general. Some excellent recent papers: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4203147/ http://www.nature.com/nrc/journal/v13/n5/abs/nrc3497.html and http://www.tandfonline.com/doi/abs/10.4161/15384101.2014.950151
Examine people who do intermittent fasting to promote autophagy.
So, for a retrospective approach with existing data, I could try to find a constellation of proxy variables in the ICD9 V-codes and maybe some lab values suggestive of basically healthy patients who consume a lower-than-typical amount of calories. Not in a very health-conscious part of the country though, so unlikely that a large number of patients would do this on purpose, let alone one specific fasting strategy.
Now, something I could do is team up with a local dietician or endocrinologist and recruit patients to try calorie restriction.
Why not run a pilot on yourself first? The nice thing about IF is that in many forms, it’s dead easy: you eat nothing one day, twice as much the next. Record some data on yourself for a few months (weight? blood glucose*? a full blood panel?), and you’ll have solid data about your own reactions to IF and a better idea what to look for.
Personally, I would be surprised if you could do worthwhile research on IF by mining research records: ‘eating food every day’ is nigh-universal, and most datasets are concerned entirely with what people eat, not when. You might have to get creative and do something like look for natural experiments involving fasting such as Ramadan.
* and don’t write off blood glucose as too painful or messy for non-diabetics to measure! Blood glucose strip testing turns out to be easier than I thought. I used up a package recently: while I nearly fainted the first time as my heart-rate plunged into the mid-50s because of my blood phobia, over the course of 10 strips I progressed to not minding and my heart-rate hardly budging.
The Ramadan natural experiment is interesting, this has been discussed wrt sport performance previously, eg see http://regressing.deadspin.com/is-fasting-for-ramadan-dangerous-at-the-world-cup-1598373130
A particular form of IF I’ve heard of from several places is even easier: only eat within an 8-hour window each day. I often do that out of sheer can’t-be-arsed-to-have-breakfastness.
(I hear that existing studies about that are pretty confounded, e.g. they find that people who don’t have breakfast are less healthy but the effect disappears when controlling for conscientiousness.)
No. With intermittent fasting your total calorie consumption isn’t necessarily below average, rather you have periods of time in which you either don’t eat or only eat fats. I do what’s called bulletproof intermittent fasting. One unexpected result is that I don’t get colds anymore because, I think, of autophagy. I used to get about four a year and I have been doing the fasting for a little over two years so this result is significant.
How do you postulate that autophagy reduces the risk of rhinovirus infection?
I last took biology in high school so don’t put too much trust in this or lower your opinion of me if this sounds silly, but: autophagy involves your body eating its own cells and your body is somehow smart enough to sometimes target harmful cells and therefore autography might be causing my body to destroy cells that get invaded by the rhinovirus infection.
OK, fair enough. I generally think of autophagy as being within a cell, but perhaps virally infected cell compartments are more likely to be “eaten” under nutrient stress, or perhaps my application of the word autophagy is incorrect here.
Here is what seems like a pretty good overview of intermittent fasting: http://easacademy.org/trainer-resources/article/intermittent-fasting
Um, calorie restriction in the necessary amounts is quite unpleasant and are you willing to commit to a multi-decade trial anyway..?