A high level post on its use would be very interesting.
I think my main criticism of the Bayes approach is that it leads to the kind of work you are suggesting i.e. have a person construct a model and then have a machine calculate its parameters.
I think that much of what we value in intelligent people is their ability to form the model themselves. By focusing on parameter updating we aren’t developing the AI techniques necessary for intelligent behavior. In addition, because correct updating does not guarantee good performance (because the model properties dominate) then we will always have to judge methods based on experimental results.
Because we always come back to experimental results, whatever general AI strategy we develop its structure is more likely to be one that searches for new ways to learn (with bayesian model updating and SVMs as examples) and validates these strategies using experimental data (replicating the behaviour of the AI field as a whole).
I find it useful to think about how people solve problems and examine the huge gulf between specific learning techniques and these approaches. For example, to replicate a Bayesian AI researcher an AI needs to take a small amount of data, an incomplete informal model of the process that generates it (e.g. based on informal metaphors of physical processes the author is familiar with) and then find a way of formalising this informal model (so that its behaviour under all conditions can be calculated) and possibly doing some theorem proving to investigate properties of the model. They then apply potentially standard techniques to determine the models parameters and judge its worth based on experiment (potentially repeating the whole process if it doesn’t work).
By focusing on Bayesian approaches we aren’t developing techniques that can replicate these kinds of lateral and creative thinking behaviour. Saying there is only one valid form of inference is absurd because it doesn’t address these problems.
I feel that trying to force our problems to suit our tools is unlikely to make much progress. For example, unless we can model (and therefore largely solve) all of the problems we want an AI to address we can’t create a “Really Good Model”.
Rather than manually developing formalisations of specific forms of similarity we need an algorithm to learn different types of similarity and then construct the formalisation itself (or not as I don’t think we actually formalise our notions of similarity and yet can still solve problems).
Automated theorem proving is a good example where the problems are well defined yet unique, so any algorithm that can construct proofs needs to see meta patterns in other proofs and apply them. This brings home the difficulty of identifying what it means for things to be similar and also emphasises the incompleteness of a probabilistic approach: the proof that the AI is trying to construct has never been encountered before, in order for it to benefit from experience it needs to invent a type of similarity to map the current problem to the past.
But even “learning to learn” is done in the context of a model, it’s just a higher-level model. There are in fact models that allow experience gained in one area to generalize to other areas (by saying that the same sorts of structures that are helpful for explaining things in one area should be considered in that other area). Talking about what an AI researcher would do is asking much more out of an AI than one would ask out of a human. If we could get an AI to even be as intelligent as a 3-year-old child then we would be more or less done. People don’t develop sophisticated problem solving skills until at least high school age, so it seems difficult to believe that such a problem is fundamental to AGI.
Another reference, this time on learning to learn, although unfortunately it is behind a pay barrier (Tenenbaum, Goodman, Kemp, “Learning to learn causal models”).
It appears that there is also a book on more general (mostly non-Bayesian) techniques for learning to learn: Sebastian Thrun’s book. I got the latter just by googling, so I have no idea what’s actually in it, other than by skimming through the chapter descriptions. It’s also not available online.
A high level post on its use would be very interesting.
I think my main criticism of the Bayes approach is that it leads to the kind of work you are suggesting i.e. have a person construct a model and then have a machine calculate its parameters.
I think that much of what we value in intelligent people is their ability to form the model themselves. By focusing on parameter updating we aren’t developing the AI techniques necessary for intelligent behavior. In addition, because correct updating does not guarantee good performance (because the model properties dominate) then we will always have to judge methods based on experimental results.
Because we always come back to experimental results, whatever general AI strategy we develop its structure is more likely to be one that searches for new ways to learn (with bayesian model updating and SVMs as examples) and validates these strategies using experimental data (replicating the behaviour of the AI field as a whole).
I find it useful to think about how people solve problems and examine the huge gulf between specific learning techniques and these approaches. For example, to replicate a Bayesian AI researcher an AI needs to take a small amount of data, an incomplete informal model of the process that generates it (e.g. based on informal metaphors of physical processes the author is familiar with) and then find a way of formalising this informal model (so that its behaviour under all conditions can be calculated) and possibly doing some theorem proving to investigate properties of the model. They then apply potentially standard techniques to determine the models parameters and judge its worth based on experiment (potentially repeating the whole process if it doesn’t work).
By focusing on Bayesian approaches we aren’t developing techniques that can replicate these kinds of lateral and creative thinking behaviour. Saying there is only one valid form of inference is absurd because it doesn’t address these problems.
I feel that trying to force our problems to suit our tools is unlikely to make much progress. For example, unless we can model (and therefore largely solve) all of the problems we want an AI to address we can’t create a “Really Good Model”.
Rather than manually developing formalisations of specific forms of similarity we need an algorithm to learn different types of similarity and then construct the formalisation itself (or not as I don’t think we actually formalise our notions of similarity and yet can still solve problems).
Automated theorem proving is a good example where the problems are well defined yet unique, so any algorithm that can construct proofs needs to see meta patterns in other proofs and apply them. This brings home the difficulty of identifying what it means for things to be similar and also emphasises the incompleteness of a probabilistic approach: the proof that the AI is trying to construct has never been encountered before, in order for it to benefit from experience it needs to invent a type of similarity to map the current problem to the past.
But even “learning to learn” is done in the context of a model, it’s just a higher-level model. There are in fact models that allow experience gained in one area to generalize to other areas (by saying that the same sorts of structures that are helpful for explaining things in one area should be considered in that other area). Talking about what an AI researcher would do is asking much more out of an AI than one would ask out of a human. If we could get an AI to even be as intelligent as a 3-year-old child then we would be more or less done. People don’t develop sophisticated problem solving skills until at least high school age, so it seems difficult to believe that such a problem is fundamental to AGI.
Another reference, this time on learning to learn, although unfortunately it is behind a pay barrier (Tenenbaum, Goodman, Kemp, “Learning to learn causal models”).
It appears that there is also a book on more general (mostly non-Bayesian) techniques for learning to learn: Sebastian Thrun’s book. I got the latter just by googling, so I have no idea what’s actually in it, other than by skimming through the chapter descriptions. It’s also not available online.