I think ‘algorithm’ is an imprecise term for this discussion.
Perhaps I used the term imprecisely—I basically meant it in a very general sense of being some process, set of rules etc. that a computer or other agent could follow to achieve the goal.
We need good decision theories to know when to search for more or better bottom-up models. What are we missing? How should we search? (When should we give up?)
The name for ‘algorithms’ (in the expansive sense) that can do what you’re asking is ‘general intelligence’. But we’re still working on understanding them!
Yes I see the relevance of decision theories there and that solving this well would be requiring a lot of what would be needed for AGI. I guess when I originally asked, I was wondering if there might have been some insights people had worked out on the way to that—just any parts of such an algorithm that people have figured out, or that at least would reduce the error of a typical scientist. But maybe that will be another while yet...
I think you’re right that such an algorithm would need to make measurements of the real system, or systems with properties matching component parts (e.g. a tank of air for climate), and have some way to identify the best measurements to make. I guess determining whether there is some important effect that’s not been accounted for yet would require a certain amount of random experimentation to be done (e.g. for climate, heating up patches of land and tanks of ocean water by a few degrees and seeing what happens to the ecology, just as we might do).
This is not necessarily impractical for something like atmospheric or oceanic modelling, where we can run trustworthy high-resolution models over small spatial regions and get data on how things change with different boundary conditions, so we can tell how the coarse models should behave. So then criteria for deciding where and when to run these simulations would be needed. Regions where errors compared to Earth observations are large and regions that exhibit relatively large changes with global warming could be a high priority. I’d have to think if there could be a sensible systematic way of doing it—I guess it would require an estimate of how much the metric of future prediction skill would decrease with information gained from a particular experiment, which could perhaps be approximated using the sensitivity of the future prediction to the estimated error or uncertainty in predictions of a particular variable. I’d need to think about that more.
I was wondering if there might have been some insights people had worked out on the way to that—just any parts of such an algorithm that people have figured out, or that at least would reduce the error of a typical scientist.
There are some pretty general learning algorithms, and even ‘meta-learning’ algorithms in the form of tools that attempt to more or less automatically discover the best model (among some number of possibilities). Machine learning hyper-parameter optimization is an example in that direction.
My outside view is that a lot of scientists should focus on running better experiments. According to a possibly apocryphal story told by Richard Feynman in a commencement address, one researcher discovered (at least some of) the controls one had to employ to be able to effectively study mice running mazes. Unfortunately, no one else bothered to employ those controls (let alone look for others)! Similarly, a lot of scientific studies or experiments are simply too small to produce even reliable statistical info. There’s probably a lot of such low hanging fruit available. Tho note that this is often a ‘bottom-up’ contribution for ‘modeling’ a larger complex system.
But as you demonstrate in your last two paragraphs, searching for a better ‘ontology’ for your models, e.g. deciding what else to measure, or what to measure instead, is a seemingly open-ended amount of work! There probably isn’t a way to avoid having to think about it more (beyond making other kinds of things that can think for us); until you find an ontology that’s ‘good enough’ anyways. Regardless, we’re very far from being able to avoid even small amounts of this kind of work.
Thanks again. OK I’ll try using MarkDown...
Perhaps I used the term imprecisely—I basically meant it in a very general sense of being some process, set of rules etc. that a computer or other agent could follow to achieve the goal.
Yes I see the relevance of decision theories there and that solving this well would be requiring a lot of what would be needed for AGI. I guess when I originally asked, I was wondering if there might have been some insights people had worked out on the way to that—just any parts of such an algorithm that people have figured out, or that at least would reduce the error of a typical scientist. But maybe that will be another while yet...
I think you’re right that such an algorithm would need to make measurements of the real system, or systems with properties matching component parts (e.g. a tank of air for climate), and have some way to identify the best measurements to make. I guess determining whether there is some important effect that’s not been accounted for yet would require a certain amount of random experimentation to be done (e.g. for climate, heating up patches of land and tanks of ocean water by a few degrees and seeing what happens to the ecology, just as we might do).
This is not necessarily impractical for something like atmospheric or oceanic modelling, where we can run trustworthy high-resolution models over small spatial regions and get data on how things change with different boundary conditions, so we can tell how the coarse models should behave. So then criteria for deciding where and when to run these simulations would be needed. Regions where errors compared to Earth observations are large and regions that exhibit relatively large changes with global warming could be a high priority. I’d have to think if there could be a sensible systematic way of doing it—I guess it would require an estimate of how much the metric of future prediction skill would decrease with information gained from a particular experiment, which could perhaps be approximated using the sensitivity of the future prediction to the estimated error or uncertainty in predictions of a particular variable. I’d need to think about that more.
There are some pretty general learning algorithms, and even ‘meta-learning’ algorithms in the form of tools that attempt to more or less automatically discover the best model (among some number of possibilities). Machine learning hyper-parameter optimization is an example in that direction.
My outside view is that a lot of scientists should focus on running better experiments. According to a possibly apocryphal story told by Richard Feynman in a commencement address, one researcher discovered (at least some of) the controls one had to employ to be able to effectively study mice running mazes. Unfortunately, no one else bothered to employ those controls (let alone look for others)! Similarly, a lot of scientific studies or experiments are simply too small to produce even reliable statistical info. There’s probably a lot of such low hanging fruit available. Tho note that this is often a ‘bottom-up’ contribution for ‘modeling’ a larger complex system.
But as you demonstrate in your last two paragraphs, searching for a better ‘ontology’ for your models, e.g. deciding what else to measure, or what to measure instead, is a seemingly open-ended amount of work! There probably isn’t a way to avoid having to think about it more (beyond making other kinds of things that can think for us); until you find an ontology that’s ‘good enough’ anyways. Regardless, we’re very far from being able to avoid even small amounts of this kind of work.