marks

Karma: 87

marks Jun 6, 2010, 4:28 AM
1 point
in reply to: orthonormal’s comment on: Significance of Compression Rate Method
This isn’t precisely what Daniel_Burfoot was talking about but its a related idea based on “sparse coding” and it has recently obtained good results in classification:

http://www.di.ens.fr/~fbach/icml2010a.pdf

Here the “theories” are hierarchical dictionaries (so a discrete hierarchy index set plus a set of vectors) which perform a compression (by creating reconstructions of the data). Although they weren’t developed with this in mind, support vector machines also do this as well, since one finds a small number of “support vectors” that essentially allow you to compress the information about decision boundaries in classification problems (support vector machines are one of the very few things from machine learning that have had significant and successful impacts elsewhere since neural networks).

The hierarchical dictionaries learned do contain a “theory” of the visual world in a sense, although an important idea is that they do so in a way that is sensitive to the application at hand. There is much left out by Daniel_Burfoot about how people actually go about implementing this line of thought.

marks Jun 6, 2010, 4:13 AM
2 points
in reply to: Richard_Kennaway’s comment on: Significance of Compression Rate Method
(A text with some decent discussion on the topic)[http://www.inference.phy.cam.ac.uk/mackay/itila/book.html]. At least one group that has a shot at winning a major speech recognition benchmark competition uses information-theoretic ideas for the development of their speech recognizer. Another development has been the use of error-correcting codes to assist in multi-class classification problems (google “error correcting codes machine learning”)[http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=error+correcting+codes+machine+learning] (arguably this has been the clearest example of a paradigm shift that comes from thinking about compression which had a big impact in machine learning). I don’t know how many people think about these problems in terms of information theory questions (since I don’t have much access to their thoughts): but I do know at least two very competent researchers who, although they never bring it outright into their papers, they have an information-theory and compression-oriented way of posing and thinking about problems.

I often try to think of how humans process speech in terms of information theory (which is inspired by a couple of great thinkers in the area), and thus I think that it is useful for understanding and probing the questions of sensory perception.

There’s also a whole literature on “sparse coding” (another compression-oriented idea originally developed by biologist but since ported over by computer vision and a few speech researchers) whose promise in machine learning may not have been realized yet, but I have seen at least a couple somewhat impressive applications of related techniques appearing.

marks Jun 6, 2010, 3:41 AM
0 points
in reply to: PhilGoetz’s comment on: Significance of Compression Rate Method
I have a minor disagreement, which I think supports your general point. There is definitely a type of compression going on in the algorithm, it’s just that the key insight in the compression is not to just “minimize entropy” but rather make the outputs of the encoder behave in a similar manner as the observed data. Indeed, that was one of the major insights in information theory is that one wants the encoding scheme to capture the properties of the distribution over the messages (and hence over alphabets).

Namely, in Hinton’s algorithm the outputs of the encoder are fed through a logistic function and then the cross-entropy is minimized (essentially the KL divergence). It seems that he’s more providing something like a reparameterization of a probability mass function for pixel intensities which is a logistic distribution when conditioned on the “deeper” nodes. Minimizing that KL divergence means that the distribution is made to be statistically indistinguishable from the distribution over the data intensities (since the KL-divergence minimizes expected log likelihood ratio-which means minimizing the power over the uniformly most powerful test).

Minimizing entropy blindly would mean the neural network nodes would give constant output: which is very compressive but utterly useless.

marks Jun 2, 2010, 6:28 PM
1 point
in reply to: PhilGoetz’s comment on: Cultivating our own gardens

This attacks a straw-man utilitarianism, in which you need to compute precise results and get the one correct answer. Functions can be approximated; this objection isn’t even a problem.

Not every function can be approximated efficiently, though. I see the scope of morality as addressing human activity where human activity is a function space itself. In this case the “moral gradient” that the consequentialist is computing is based on a functional defined over a function space. There are plenty of function spaces and functionals which are very hard to efficiently approximate (the Bayes predictors for speech recognition and machine vision fall into this category) and often naive approaches will fail miserably.

I think the critique of utility functions is not that they don’t provide meaning, but that they don’t necessarily capture the meaning which we would like. The incoherence argument is that there is no utility function which can represent the thing we want to represent. I don’t buy this argument mostly because I’ve never seen a clear presentation of what it is that we would preferably represent, but many people do (and a lot of these people study decision-making and behavior whereas I study speech signals). I think it is fair to point out that there is only a very limited biological theory of “utility” and generally we estimate “utility” phenomenologically by studying what decisions people make (we build a model of utility and try to refine it so that it fits the data). There is a potential that no utility model is actually going to be a good predictor (i.e. that there is some systematic bias). So, I put a lot of weight on the opinions of decision experts in this regard: some think utility is coherent and some don’t.

The deontologist’s rules seem to do pretty well as many of them are currently sitting in law books right now. They form the basis for much of the morality that parents teach their children. Most utilitarians follow most of them all the time, anyway.

My personal view is to do what I think most people do: accept many hard constraints on one’s behavior and attempt to optimize over estimates of projections of a moral gradient along a few dimensions of decision-space. I.e. I try to think about how my research may be able to benefit people, I also try to help out my family and friends, I try to support things good for animals and the environment. These are areas where I feel more certain that I have some sense where some sort of moral objective function points.

marks Jun 2, 2010, 2:39 AM
0 points
in reply to: PhilGoetz’s comment on: Cultivating our own gardens
I would like you to elaborate on the incoherence of deontology so I can test out how my optimization perspective on morality can handle the objections.

marks Jun 2, 2010, 2:37 AM
3 points
in reply to: PhilGoetz’s comment on: Cultivating our own gardens
To be clear I see the deontologist optimization problem as being a pure “feasibility” problem: one has hard constraints and zero gradient (or approximately zero gradient) on the moral objective function given all decisions that one can make.

Of the many, many critiques of utilitarianism some argue that its not sensible to actually talk about a “gradient” or marginal improvement in moral objective functions. Some argue this on the basis of computational constraints: there’s no way that you could ever reasonably compute a moral objective function (because the consquences of any activity are much to complicated) to other critiques that argue the utilitarian notion of “utility” is ill-defined and incoherent (hence the moral objective function has no meaning). These sorts of arguments undermine argue against the possibility of soft-constraints and moral objective functions with gradients.

The deontological optimization problem, on the other hand, is not susceptible to such critiques because the objective function is constant, and the satisfaction of constraints is a binary event.

I would also argue that the most hard-core utilitarian practically acts pretty similarly to a deontologist. The reason is that we only consider a tiny subspace of all possible decisions, and our estimate of the moral gradient will be highly inaccurate over most possible decision axis (I buy the computational-constraint critique), and its not clear that we have enough information about human experience in order to compute those gradients. So, practically speaking: we only consider a small number of different way to live our lives (hence we optimize over a limited range of axes) and the directions we optimize over is not-random for the most part. Think about how most activists and most individuals who perform any sort of advocacy focus on a single issue.

Also consider the fact that most people don’t murder or perform certain forms of horrendous crimes. These single issue thinking, law-abiding types may not think of themselves as deontologist but a deontologist would behave very similarly to them since neither attempts to estimate moral gradients over decisions and treats many moral rules as binary events.

The utilitarian and the deontologist are distinguished in practice in that the utilitarian computes a noisy estimate of the moral gradient along a few axes of their potential decision-space: while everywhere else we think of hard constraints and no gradients on the moral objective. The pure utilitarian is at best a theoretical concept that has no potential basis in reality.

marks Jun 1, 2010, 5:43 PM
0 points
in reply to: PhilGoetz’s comment on: Cultivating our own gardens
I would argue that deriving principles using the categorical imperative is a very difficult optimization problem and that there is a very meaningful sense in which one is a deontologist and not a utilitarian. If one is a deontologist then one needs to solve a series of constraint-satisfaction problems with hard constraints (i.e. they cannot be violated). In the Kantian approach: given a situation, one has to derive the constraints under which one must act in that situation via moral thinking then one must accord to those constraints.

This is very closely related to combinatorial optimization problems. I would argue that often there is a “moral dual” (in the sense of a dual program) where those constraints are no longer treated as absolute and you can assign different costs to each violation and you can then find a most moral strategy. I think very often we have something akin to strong duality where the utilitarian dual is equivalent to the deontological problem, but its an important distinction to remember that the deontologist has hard constraints and zero gradient on their objective functions (by some interpretations).

The utilitarian performs a search over a continuous space for the greatest expected utility, while the deontologist (in an extreme case) has a discrete set of choices, from which the immoral ones are successively weeded out.

Both are optimization procedures, and can be shown to produce very similar output behavior but the approach and philosophy are very different. The predictions of the behavior of the deontologist and the utilitarian can become quite different under the sorts of situations that moral philosophers love to come up with.

marks Jun 1, 2010, 5:05 PM
1 point
in reply to: PhilGoetz’s comment on: Cultivating our own gardens
I agree with the beginning of your comment. I would add that the authors may believe they are attacking utilitarianism, when in fact they are commenting on the proper methods for implementing utilitarianism.

I disagree that attacking utilitarianism involves arguing for different optimization theory. If a utilitarian believed that the free market was more efficient at producing utility then the utilitarian would support it: it doesn’t matter by what means that free market, say, achieved that greater utility.

Rather, attacking utilitarianism involves arguing that we should optimize for something else: for instance something like the categorical imperative. A famous example of this is Kant’s argument that one should never lie (since it could never be willed to be a universal law, according to him), and the utilitarian philosopher loves to retort that lying is essential if one is hiding a Jewish family from the Nazis. But Kant would be unmoved (if you believe his writings), all that would matter are these universal principles.

marks Jun 1, 2010, 2:52 PM
4 points
in reply to: Sly’s comment on: Diseased thinking: dissolving questions about disease
Bear in mind that having more fat means that the brain gets starved of (glucose)[http://www.loni.ucla.edu/~thompson/ObesityBrain2009.pdf] and blood sugar levels have (impacts on the brain generally)[http://ajpregu.physiology.org/cgi/content/abstract/276/5/R1223]. Some research has indicated that the amount of sugar available to the brain has a relationship with self-control. A moderately obese person may have fat cells that steal so much glucose from their brain that their brain is incapable of mustering the will in order to get them to stop eating poorly. Additionally, the marginal fat person is likely fat because of increased sugar consumption (which has been the main sort of food whose intake has increased since the origins of the obesity epidemic in the 1970s), in particular there has been a great increase in the consumption of fructose: which is capable of raising insulin levels (which signal to the body to start storing energy as fat) while at the same time not activating leptin (which makes you feel full). Thus, people are consuming this substance that may be kicking their bodies into full gear to produce more fat: which leaves them with no energy or will to perform any exercise.

The individuals most affected by the obesity epidemic are the poor and recall that some of the cheapest sources of calories available on the market are foods like fructose and processed meats. While there is a component of volition regardless, if the body works as the evidence suggests: they may have a diet that is pushing them quite hard towards being obese, sedentary, and unable to do anything about it.

Think about it this way, if you constantly wack me over the head you can probably get me to do all sorts of things that I wouldn’t normally do: but it wouldn’t be right to call my behavior in that situation “voluntary”. Fat people may be in a similar situation.

marks Jun 1, 2010, 4:11 AM
3 points
on: Cultivating our own gardens
I think that this post has something to say about political philosophy. The problem as I see it is that we want to understand how our local decision-making affects the global picture and what constraints should we put on our local decisions. This is extremely important because, arguably, people make a lot of local decisions that make us globally worse off: such as pollution (“externalities” in econo-speak). I don’t buy the author’s belief that we should ignore these global constraints: they are clearly important—indeed its the fear of the potential global outcomes of careless local decision-making that arguably led to the creation of this website.

However, just like a computers we have a lot of trouble integrating the global constraints into our decision-making (which is necessarily a local operation), and we probably have a great deal of bias in our estimates of what is the morally best set of choices for us to make. Just like the algorithm we would like to find some way to make the computational burden on us less in order to achieve these moral ends.

There is an approach in economics to understand social norms advocated by Herbert Gintis [PDF] that is able to analyze these sorts of scenarios. The essential idea is this: agents can engage in multiple correlated equilibria (these are a generalized version of Nash equilibria) possible as a result of various social norms. These correlated equilibria are, in a sense, patched together by a social norm from the “rational” (self-interested, local expected utility maximizers) agents’ decisions. Human rights could definitely be understood in this light (I think: I haven’t actually worked out the model).

Similar reasoning may also be used to understand certain types of laws and government policies. It is via these institutions (norms, human organizations, etc.) that we may efficiently impose global constraints on people’s local decision-making. The karma system, for instance, on Less wrong probably changes the way that people make their decision to comment.

There is a probably a computer science—economics crossover paper here that would describe how institutions can lower the computational burden on individuals in their decision-making: so that when individuals make decisions in these simpler domains we can be sure that we will still be globally better off.

One word of caution is that this is precisely the rational behind “command economies” and these didn’t work out so well during the 20th century. So choosing the “patching together” institution well is absolutely essential.

marks Jun 1, 2010, 3:49 AM
1 point
on: Cultivating our own gardens
I think there is definitely potential to the idea, but I don’t think you pushed the analogy quite far enough. I can see an analogy between what is presented here and human rights and to Kantian moral philosophy.

Essentially, we can think of human rights as being what many people believe to be an essential bare-minimum conditions on human treatment. I.e. that the class of all “good and just” worlds everybody’s human rights will be respected. Here human rights corresponds to the “local rigidity” condition of the subgraph. In general, too, human rights are generally only meaningful for people one immediately interacts with in your social network.

This does simplify the question of just government and moral action in the world (as political philosophers are so desirous of using such arguments). I don’t think, however, that the local conditions for human existence are as easy to specify as in the case of a sensor network graph.

In some sense there is a tradition largely inspired by Kant that attempts to do the moral equivalent of what you are talking about: use global regularity conditions (on morals) to describe local conditions (on morals: say the ability to will a moral decision to a universal law). Kant generally just assumed that these local conditions would achieve the necessary global requirements for morality (perhaps this is what he meant by a Kingdom of Ends). For Kant the local conditions on your decision-making were necessary and sufficient conditions for the global moral decision-making.

In your discussion (and in the approach of the paper), however, the local conditions placed (on morals or on each patch) are not sufficient to achieve the global conditions (for morality, or on the embedding). So its a weakening of the approach advanced by Kant. The idea seems to be that once some aspects (but not all) of the local conditions have been worked out one can then piece together the local decision rules into something cohesive.

Edit: I rambled, so I put my other idea into another commend

marks May 30, 2010, 10:21 PM
1 point
in reply to: JoshuaZ’s comment on: Significance of Compression Rate Method
All the sciences mentioned above definitely do rely on controlled experimentation. But their central empirical questions are not amenable to being directly studied by controlled experimentation. We don’t have multiple earths or natural histories upon which we can draw inference about the origins of species.

There is a world of difference between saying “I have observed speciation under these laboratory conditions” and “speciation explains observed biodiversity”. These are distinct types of inferences. This of course does not mean that people who perform inference on natural history don’t use controlled experiments: indeed they should draw on as much knowledge as possible about the mechanisms of the world in order to construct plausible theories of the past: but they can’t run the world multiple times under different conditions to test their theories of the past in the way that we can test speciation.

marks May 30, 2010, 10:09 PM
2 points
in reply to: timtyler’s comment on: Significance of Compression Rate Method
I think we are talking past each other. I agree that those are experiments in a broad and colloquial use of the term. They aren’t “controlled” experiments: which is a term that I was wanting to clarify (since I know a little bit about it). This means that they do not allow you to randomly assign treatments to experimental units which generally means that the risk of bias is greater (hence the statistical analysis must be done with care and the conclusions drawn should face greater scrutiny).

Pick up any textbook on statistical design or statistical analysis of experiments and the framework I gave will be what’s in there for “controlled experimentation”. There are other types of experiments. But these suffer from the problem that it can be difficult to sort out hidden causes. Suppose we want to know if the presence of A causes C (say eating meat causes heart disease). In an observational study we find units having trait A and those not (so find meat-eaters and vegetarians) and we then wait to observe response C. If we observe a response C in experimental units possessing trait A, its hard to know if A causes C or if there is some third trait B (present in some of the units) which causes both A and C.

In the case of a controlled experiment, A is now a treatment and not a trait of the units (so in this case you would randomly assign a carnivorous or vegetarian diet to people), thus we can randomly assign A to the units (and assume the randomization means that not every unit having hidden trait B will be given treatment A). In this case we might observe that A and C have no relation, whereas in the observational study we might. (For instance people who choose to be vegetarian may be more focused on health)

An example of how econometricians have dealt with “selection bias” or the fact that observation studies fail to have certain nice properties of controlled experiments is here

marks May 30, 2010, 8:21 PM
0 points
in reply to: timtyler’s comment on: Significance of Compression Rate Method
I think it’s standard in the literature: “The word experiment is used in a quite precise sense to mean an investigation where the system under study is under the control of the investigator. This means that the individuals or material investigated, the nature of the treatments or manipulations under study and the measurement procedures used are all settled, in their important features at least, by the investigator.” The theory of the design of experiments

To be sure there are geological experiments where one, say, takes rock samples and subjects various samples to a variety of treatments, in order to simulate potential natural processes. But there is another chunk of the science which is meant to describe the Earth’s geological history and for a controlled experiment on that you would need to control the natural forces of the Earth and to have multiple Earths.

The reason why one needs to control an experiment (this is a point elaborated on at length in Cox and Reid) is in order to prevent bias. Take the hypothesis of continental drift. We have loads of “suspicious coincidences” that suggest continental drift (such as similar fossils on different landmasses, certain kinds of variations in the magnetic properties of the seafloor, the fact that the seafloor rocks are much younger than land rocks, earthquake patterns/fault-lines). Critically, however, we don’t have an example of an earth that doesn’t have continental drift. It is probably the case that some piece of “evidence” currently used to support the theory of continental drift will turn out to be a spurious correlation. Its very difficult to test for these because of the lack of control. The fact that we are almost certainly on a continental-drifting world biases us towards think that some geological phenomenon is caused by drift even when they not.

marks May 30, 2010, 3:13 PM
3 points
in reply to: timtyler’s comment on: Significance of Compression Rate Method
Those sciences are based on observations. Controlled experimentation requires that you have some set of experimental units to which you randomly assign treatments. With geology, for instance, you are trying to figure out the structure of the Earth’s crust (mostly). There are no real treatments that you apply, instead you observe the “treatments” that have been applied by the earth to the earth. I.e. you can’t decide which area will have a volcano, or an earthquake: you can’t choose to change the direction of a plate or change the configuration of the plates: you can’t change the chemical composition of the rock under large scale, etc.

All one can do is carefully collect measurements, build models of them, and attempt to create a cohesive picture that explains the phenomena. Control implies that you can do more than just collect measurements.

marks May 25, 2010, 10:05 AM
1 point
in reply to: snarles’s comment on: Be a Visiting Fellow at the Singularity Institute
Bear in mind that the people who used steam engines to make money didn’t make it by selling the engines: rather, the engines were useful in producing other goods. I don’t think that the creators of a cheap substitute for human labor (GAI could be one such example) would be looking to sell it necessarily. They could simply want to develop such a tool in order to produce a wide array of goods at low cost.

I may think that I’m clever enough, for example, to keep it in a box and ask it for stock market predictions now and again. :)

As for the “no free lunch” business, while its true that any real-world GAI could not efficiently solve every induction problem, it wouldn’t need to either for it to be quite fearsome. Indeed being able to efficiently solve at least the same set of induction problems that humans solve (particularly if its in silicon and the hardware is relatively cheap) is sufficient to pose a big threat (and be potentially quite useful economically).

Also, there is a non-zero possibility that there already exists a GAI and its creators, decided the safest, most lucrative, and beneficial thing to do is set the GAI on designing drugs: thereby avoiding giving the GAI too much information about the world. The creators could have then set up a biotech company that just so happens to produce a few good drugs now and again. Its kind of like how automated trading came from computer scientists and not the currently employed traders. I do think its unlikely that somebody working in medical research is going to develop GAI least of all because of the job threat. The creators of a GAI are probably going to be full time professionals who are are working on the project.

marks May 23, 2010, 11:05 PM
2 points
in reply to: timtyler’s comment on: Link: Strong Inference
Go to 1:00 minute here

“Building the best possible programs” is what he says.

marks May 23, 2010, 5:09 PM
3 points
in reply to: Daniel_Burfoot’s comment on: Link: Strong Inference
It actually comes from Peter Norvig’s definition that AI is simply good software, a comment that Robin Hanson made: , and the general theme of Shane Legg’s definitions: which are ways of achieving particular goals.

I would also emphasize that the foundations of statistics can (and probably should) be framed in terms of decision theory (See DeGroot, “Optimal Statistical Decisions” for what I think is the best book on the topic, as a further note the decision-theoretic perspective is neither frequentist nor Bayesian: those two approaches can be understood through decision theory). The notion of an AI as being like an automated statistician captures at least the spirit of how I think about what I’m working on and this requires fundamentally economic thinking (in terms of the tradeoffs) as well as notions of utility.

marks May 23, 2010, 5:00 PM
0 points
in reply to: timtyler’s comment on: Link: Strong Inference
The fact that there are so many definitions and no consensus is precisely the unclarity. Shane Legg has done us all a great favor by collecting those definitions together. With that said, his definition is certainly not the standard in the field and many people still believe their separate definitions.

I think his definitions often lack an understanding of the statistical aspects of intelligence, and as such they don’t give much insight into the part of AI that I and others work on.

marks May 23, 2010, 3:48 AM
1 point
on: Link: Strong Inference
I think there is a science of intelligence which (in my opinion) is closely related to computation, biology, and production functions (in the economic sense). The difficulty is that there is much debate as to what constitutes intelligence: there aren’t any easily definable results in the field of intelligence nor are there clear definitions.

There is also the engineering side: this is to create an intelligence. The engineering is driven by a vague sense of what an AI should be, and one builds theories to construct concrete subproblems and give a framework for developing solutions.

Either way this is very different than astrophysics where one is attempting to: say, explain the motions of the heavenly sphere: which have a regularity, simplicity, and clarity to them that is lacking in any formulation of the AI problem.

I would say that AI researchers do formulate theories about how to solve particular engineering problems for AI systems, and then they test them out by programming them (hopefully). I suppose I count, and that’s certainly what I and my colleagues do. Most papers in my fields of interest (machine learning and speech recognition) usually include an “experiments” section. I think that when you know a bit more about the actually problems AI people are solving you’ll find that quite a bit of progress has been achieved since the 1960′s.