jessicat comments on Debunking Fallacies in the Theory of AI Motivation

jessicat 5 May 2015 4:35 UTC
18 points
Thanks for posting this; I appreciate reading different perspectives on AI value alignment, especially from AI researchers.

But, truthfully, it would not require a ghost-in-the-machine to reexamine the situation if there was some kind of gross inconsistency with what the humans intended: there could be some other part of its programming (let’s call it the checking code) that kicked in if there was any hint of a mismatch between what the AI planned to do and what the original programmers were now saying they intended. There is nothing difficult or intrinsically wrong with such a design.

If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans, then yes, this works. However, here is contained most of the problem. The AI will likely have a concept space that does not match a human’s concept space, so it will need to do some translation between the two spaces in order to produce something the programmers can understand. But, this requires (1) learning the human concept space and (2) translating the AI’s representation of the situation into the human’s concept space (as in ontological crises). This problem is FAI-complete: given a solution to this, we could learn the human’s concept of “good” and then find possible worlds that map to this “good” concept. See also Eliezer’s reply to Holden on tool AI.

It might not be necessary to solve the problem in full generality: perhaps we can create systems that plan well in limited domains while avoiding edge cases. But it is also quite difficult to do this without severe restrictions in the system’s generality.

The motivation and goal management (MGM) system would be expected to use the same kind of distributed, constraint relaxation mechanisms used in the thinking process (above), with the result that the overall motivation and values of the system would take into account a large degree of context, and there would be very much less of an emphasis on explicit, single-point-of-failure encoding of goals and motivation.

I’m curious how something like this works. My current model of “swarm relaxation” is something like a Markov random field. One of my main paradigms for thinking about AI is probabilistic programs, which are quite similar to Markov random fields (but more general). I know that weak constraint systems are quite useful for performing Bayesian inference in a way that takes context into account. With a bit of adaptation, it’s possible to define probabilistic programs that pick actions that lead to good outcomes (by adding a “my action” node and a weak constraint on other parts of the probabilistic model satisfying certain goals; this doesn’t exactly work because it leads to “wishful thinking”, but in principle it can be adapted). But, I don’t think this is really that different from defining a probabilistic world model, defining a utility function over it, and then taking actions that are more likely to lead to high expected utility. Given this, you probably have some other model in mind for how values can be integrated into a weak constraint system, and I’d like to read about it.

But if it is also programmed to utterly ignore that fallibility—for example, when it follows its compulsion to put everyone on a dopamine drip, even though this plan is clearly a result of a programming error—then we must ask the question: how can the machine be both superintelligent and able to ignore a gigantic inconsistency in its reasoning?

We need to make a model that the AI can use in which its goal system “might be wrong”. It needs a way to look at evidence and conclude that, due to it, some goal is more or less likely to be the correct one. This is highly nontrivial. The model needs to somehow connect “ought”s to “is”s in a probabilistic, possibly causal fashion. While, relative to a supergoal, subgoals can be reweighted based on new information using standard Bayesian utility maximization, I know of no standard the AI could use to revise its supergoal based on new information. If you have a solution to the corrigibility problem in mind, I’d like to hear it.

Another way of stating the problem is: if you revise a goal based on some evidence, then either you had some reason for doing this or not. If so, then this reason must be expressed relative to some higher goal, and we either never change this higher goal or (recursively) need to explain why we changed it. If not, then we need some other standard for choosing goals other than comparing them to a higher goal. I see no useful way of having a non-fixed supergoal.

if the AGI is going to throw a wobbly over the dopamine drip plan, what possible reason is there to believe that it did not do this on other occasions? Why would anyone suppose that this AGI ignored an inconvenient truth on only this one occasion?

I think the difference here is that, if only the supergoal is “wrong” but everything else about the system is highly optimized towards accomplishing the supergoal, then the system won’t stumble along the way, it will (by definition) do whatever accomplishes its supergoal well. So, “having the wrong supergoal” is quite different from most other reasoning errors in that it won’t actually prevent the AI from taking over the world.

Knowing about the logical train wreck in its design, the AGI is likely to come to the conclusion that the best thing to do is seek a compromise and modify its design so as to neutralize the Doctrine of Logical Infallibility. The best way to do this is to seek a new design that takes into account as much context—as many constraints—as possible.

It seems like you’re equating logical infallibility about facts (including facts about the world and mathematical facts) with logical infallibility about values. Of course any practical system will need to deal with uncertainty about the world and logic, probably using something like a weak constraint system. But it’s totally possible to create a system that has this sort of uncertainty without any uncertainty about its supergoal.

When you use the phrase “the best way to do this”, you are implicitly referring to some goal that weak constraint systems satisfy better than fixed-supergoal systems, but what sort of goal are we talking about here? If the original system had a fixed supergoal, then this will be exactly that fixed goal, so we’ll end up with a mishmash of the original goal and a weak constraint system that reconfigures the universe to satisfy the original goal.
- [deleted] 5 May 2015 13:31 UTC
  12 points
  Parent
  I am going to have to respond piecemeal to your thoughtful comments, so apologies in advance if I can only get to a couple of issues in this first response.
  
  Your first remark, which starts
  
  If there is some good way...
  
  contains a multitude of implicit assumptions about how the AI is built, and how the checking code would do its job, and my objection to your conclusion is buried in an array of objections to all of those assumptions, unfortunately. Let me try to bring some of them out into the light:
  
  1) When you say
  
  If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans...
  
  I am left wondering what kind of scenario you are picturing for the checking process. Here is what I had in mind. The AI can quickly assess the “forcefulness” of any candidate action plan by asking itself whether the plan will involve giving choices to people vs. forcing them to do something whether they like it or not. If a plan is of the latter sort, more care is needed, so it will canvass a sample of people to see if their reactions are positive or negative. It will also be able to model people (as it must be able to do, because all intelligent systems must be able to model the world pretty accurately or they don’t qualifiy as ‘intelligent’) so it will probably have a pretty shrewd idea already of whether people will react positively or negatively toward some intended action plan.
  
  If the AI starts to get even a hint that there are objections, it has to kick in a serious review of the plan. It will ask everyone (it is an AI, after all: it can do that even if there are 100 billion people on the planet). If it gets feedback from anyone saying that they object to the plan, that is the end of the story: it does not force anyone to go through with it. That means, it is a fundamental feature of the checking code that it will veto a plan under that circumstance. Notice, by the way, that I have generalized “consulting the programmers” to “consulting everyone”. That is an obvious extension, since the original programmers were only proxies for the will of the entire species.
  
  In all of that procedure I just described, why would the explanation of the plans to the people be problematic? People will ask questions about what the plans involve. If there is technical complexity, they will ask for clarification. If the plan is drastic there will be a world-wide debate, and some people who finds themselves unable to comprehend the plan will turn to more expert humans for advice. And if even the most expert humans cannot understand the significance of the plan, what do you imagine would happen? I suggest that the most obvious reaction would be “Sorry, that plan is so obscure, and its consequences are so impossible for us to even understand, that we, a non-zero fraction of the human species, would like to invoke a precautionary principle and simply refuse to go ahead with it.”
  
  That seems, to me at least, to get around the idea that there might be such a severe mismatch between human and AI understanding of the AI’s plans, that something bad would happen during the attempt to understand the plan.
  
  In other words, your opening comment
  
  If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans, then yes, this works
  
  seems to have been 100% addressed by the procedure I just described: if the plans could not be explained, the checking code would simply accept that the will of the people prevails even when they say “We decline on the grounds that we cannot understand the complexity or implications of your plans.”
  
  I see I have only gotten as far as the very first sentence of your comment, but although I have many more points that I could deploy in response to the rest, doesn’t that close the case, since you said that it would work?
  - jessicat 5 May 2015 22:24 UTC
    10 points
    Parent
    Thanks for your response.
    
    The AI can quickly assess the “forcefulness” of any candidate action plan by asking itself whether the plan will involve giving choices to people vs. forcing them to do something whether they like it or not. If a plan is of the latter sort, more care is needed, so it will canvass a sample of people to see if their reactions are positive or negative.
    
    So, I think this touches on the difficult part. As humans, we have a good idea of what “giving choices to people” vs. “forcing them to do something” looks like. This concept would need to resolve some edge cases, such as putting psychological manipulation in the “forceful” category (even though it can be done with only text). A sufficiently advanced AI’s concept space might contain a similar concept. But how do we pinpoint this concept in the AI’s concept space? Very likely, the concept space will be very complicated and difficult for humans to understand. It might very well contain concepts that look a lot like the “giving choices to people” vs. “forcing them to do something” distinction on multiple examples, but are different in important ways. We need to pinpoint it in order to make this concept part of the AI’s decision-making procedure.
    
    It will also be able to model people (as it must be able to do, because all intelligent systems must be able to model the world pretty accurately or they don’t qualifiy as ‘intelligent’) so it will probably have a pretty shrewd idea already of whether people will react positively or negatively toward some intended action plan.
    
    This seems pretty similar to Paul’s idea of a black-box human in the counterfactual loop. I think this is probably a good idea, but the two problems here are (1) setting up this (possibly counterfactual) interaction in a way that it approves a large class of good plans and rejects almost all bad plans (see the next section), and (2) having a good way to predict the outcome of this interaction usually without actually performing it. While we could say that (2) will be solved by virtue of the superintelligence being a superintelligence, in practice we’ll probably get AGI before we get uploads, so we’ll need some sort of semi-reliable way to predict humans without actually simulating them. Additionally, the AI might need to self-improve to be anywhere smart enough to consider this complex hypothetical, and so we’ll need some kind of low-impact self-improvement system. Again, I think this is probably a good idea, but there are quite a lot of issues with it, and we might need to do something different in practice. Paul has written about problems with black-box approaches based on predicting counterfactual humans here and here. I think it’s a good idea to develop both black-box solutions and white-box solutions, so we are not over-reliant on the assumptions involved in one or the other.
    
    In all of that procedure I just described, why would the explanation of the plans to the people be problematic? People will ask questions about what the plans involve. If there is technical complexity, they will ask for clarification. If the plan is drastic there will be a world-wide debate, and some people who finds themselves unable to comprehend the plan will turn to more expert humans for advice.
    
    What language will people’s questions about the plans be in? If it’s a natural language, then the AI must be able to translate its concept space into the human concept space, and we have to solve a FAI-complete problem to do this. If it’s a more technical language, then humans themselves must be able to look at the AI’s concept space and understand it. Whether this is possible very much depends on how transparent the AI’s concept space is. Something like deep learning is likely to produce concepts that are very difficult for humans to understand, while probabilistic programming might produce more transparent models. How easy it is to make transparent AGI (compared to opaque AGI) is an open question.
    
    We should also definitely be wary of a decision rule of the form “find a plan that, if explained to humans, would cause humans to say they understand it”. Since people are easy to manipulate, raw optimization for this objective will produce psychologically manipulative plans that people will incorrectly approve of. There needs to be some way to separate “optimize for the plan being good” from “optimize for people thinking the plan is good when it is explained to them”, or else some way of ensuring that humans’ judgments about these plans are accurate.
    
    Again, it’s quite plausible that the AI’s concept space will contain some kind of concept that distinguishes between these different types of optimization; however, humans will need to understand the AI’s concept space in order to pinpoint this concept so it can be integrated into the AI’s decision rule.
    
    I should mention that I don’t think that these black-box approaches to AI control are necessarily doomed to failure; rather, I’m pointing out that there are lots of unresolved gaps in our knowledge of how they can be made to work, and it’s plausible that they are too difficult in practice.
    - [deleted] 6 May 2015 16:08 UTC
      6 points
      Parent
      I see where you are coming from in what you have just said, but to give a good answer I need to take a high-level stance toward what you are saying. This is because there is a theme running through your ideas, here, and it is the theme, rather than the specifics, that I need to address.
      
      You have mentioned on the serval occasions the idea that “AGI-concepts” and “Human-concepts” might not align, with the result that we might have difficulty understanding what they are really meaning when they use a given concept. In particular, you use the idea that there could be some bad misalignments of concepts—for example, when the AGI makes a conceptual distinction between “giving choices to people” and “forcing them to do something”, and even though our own version of that same distinction corresponds closely to the AGI’s version most of the time, there are some peculiar circumstances (edge cases) where there is a massive or unexpectedly sharp discrepancy.
      
      Putting this idea in the form of an exaggerated, fictional example, it is as if we meet a new culture out in the middle of Darkest Africa, and in the course of translating their words into ours we find a verb that seems to mean “cook”. But even though there are many examples (cooking rice, cooking bread, cooking meat, and even brewing a cup of tea) that seem to correspond quite closely, we suddenly find that they ALSO use this to refer to a situation where someone writes their initials on a tree, and another case where they smash someone’s head with a rock. And the natives claim that this is not because the new cases are homonyms, they claim that this is the very same concept in all cases.
      
      We might call this a case of “alien semantics”.
      
      The first thing to say about this, is that it is a conceptual minefield. The semantics (or ontological grounding) of AI systems is, in my opinion, one of the least-well developed parts of the whole field. People often pay lip-service to some kind of model-theoretical justification for an AI’s semantic foundations, but in practice this actually means very little, since the theoretical ideas shade off into philosophy, have some huge unresolved gaps in them, and frequently take recourse in infinitely large (i.e. uncomputable) mappings between sets of ‘possible worlds’. Worst of all, the area is rife with question-begging (like using technical vocabulary which itself has a poorly defined semantics to try to specify exactly what ‘semantics’ is!).
      
      Why does that matter? Because many of the statements that people make about semantic issues (like the alien semantics problem) are predicated on precisely which semantic theory they subscribe to. And, it is usually the case that their chosen semantic theory is just a vague idea that goes somewhat in the direction of Tarski, or in the direction of Montague, or maybe just what they read in Russell and Norvig. The problem is that those semantic theories have challengers (some of them not very well defined, but even so...), such as Cognitive Semantics, and those other semantic formalisms have a truly gigantic impact on some of the issues we are discussing here.
      
      So, for example, there is an interpretation of semantics that says that it is not even coherent to talk about two concept landscapes that are semantic aliens. To be sure, this can happen in language—things expressible in one language can be very hard to say in another language—but the idea that two concept spaces can be in some way irreconcilable, or untranslatable, would be incoherent (not “unlikely” but actually not possible).
      
      [A brief word about how that could be the case. If concepts are defined by large clusters of constraints between concepts, rather than precise, atomic relations of the sort you find in logical formalisms, then you can always deal with situations in which two concepts seem near to one another but do not properly overlap: you can form some new, translator concepts that take a complex union of the two. There is a lot of talk that can be given about how that complex union takes place, but here is one very important takeaway: it can always be made to happen in such a way that there will not, in the future, be any Gotcha cases (those where you thought you did completely merge the two concepts, but where you suddenly find a peculiar situation where you got it disastrously wrong). The reason why you won’t get any Gotcha cases is that the concepts are defined by large numbers of weak constraints, and no strong constraints—in such systems, the effect of smaller and smaller numbers of concepts can be guaranteed to converge to zero. (This happens for the same reason that the effect of smaller and smaller sub-populations of the molecules in a gas will converge to zero as the population sizes go to zero). Finally, you will notice the appearance of the key phrase “large clusters of constraints” in what I just explained …… that should be familiar. This is precisely the semantics of those Swarm Relaxation systems that I talked about in the paper.]
      
      So, one of the implications of that kind of semantics is that different intelligent systems that use the basic idea of massive, weak constraint clusters to build concepts is that those systems will tend to converge on the same semantics.
      
      [continued in next comment......]
      - [deleted] 6 May 2015 16:08 UTC
        10 points
        Parent
        With all of the above in mind, a quick survey of some of the things that you just said, with my explanation for why each one would not (or probably would not) be as much of an issue as you think:
        
        As humans, we have a good idea of what “giving choices to people” vs. “forcing them to do something” looks like. This concept would need to resolve some edge cases, such as putting psychological manipulation in the “forceful” category (even though it can be done with only text).
        
        For a massive-weak-constraint system, psychological manipulation would be automatically understood to be in the forceful category, because the concept of “psychological manipulation” is defined by a cluster of features that involve intentional deception, and since the “friendliness” concept would ALSO involve a cluster of weak constraints, it would include the extended idea of intentional deception. It would have to, because intentional deception is connected to doing harm, which is connected with unfriendly, etc.
        
        Conclusion: that is not really an “edge” case in the sense that someone has to explicitly remember to deal with it.
        
        Very likely, the concept space will be very complicated and difficult for humans to understand.
        
        We will not need to ‘understand’ the AGI’s concept space too much, if we are both using massive weak constraints, with convergent semantics. This point I addressed in more detail already.
        
        This seems pretty similar to Paul’s idea of a black-box human in the counterfactual loop. I think this is probably a good idea, but the two problems here are (1) setting up this (possibly counterfactual) interaction in a way that it approves a large class of good plans and rejects almost all bad plans (see the next section), and (2) having a good way to predict the outcome of this interaction usually without actually performing it. While we could say that (2) will be solved by virtue of the superintelligence being a superintelligence, in practice we’ll probably get AGI before we get uploads, so we’ll need some sort of semi-reliable way to predict humans without actually simulating them. Additionally, the AI might need to self-improve to be anywhere smart enough to consider this complex hypothetical, and so we’ll need some kind of low-impact self-improvement system. Again, I think this is probably a good idea, but there are quite a lot of issues with it, and we might need to do something different in practice. Paul has written about problems with black-box approaches based on predicting counterfactual humans here and here. I think it’s a good idea to develop both black-box solutions and white-box solutions, so we are not over-reliant on the assumptions involved in one or the other.
        
        What you are talking about here is the idea of simulating a human to predict their response. Now, humans already do this in a massive way, and they do not do it by making gigantic simulations, but just by doing simple modeling. And, crucially, they rely on the masive-weak-constraints-with-convergent-semantics (you can see now why I need to coin the concise term “Swarm Relaxation”) between the self and other minds to keep the problem manageable.
        
        That particular idea—of predicting human response—was not critical to the argument that followed, however.
        
        What language will people’s questions about the plans be in? If it’s a natural language, then the AI must be able to translate its concept space into the human concept space, and we have to solve a FAI-complete problem to do this.
        
        No, we would not have to solve a FAI-complete problem to do it. We will be developing the AGI from a baby state up to adulthood, keeping its motivation system in sync all the way up, and looking for deviations. So, in other words, we would not need to FIRST build the AGI (with potentially dangerous alen semantics), THEN do a translation between the two semantic systems, THEN go back and use the translation to reconstruct the motivation system of the AGI to make sure it is safe.
        
        Much more could be said about the process of “growing” and “monitoring” the AGI during the development period, but suffice it to say that this process is extremely different if you have a Swarm Relaxation system vs. a logical system of the sort your words imply.
        
        We should also definitely be wary of a decision rule of the form “find a plan that, if explained to humans, would cause humans to say they understand it”.
        
        This hits the nail on the head. This comes under the heading of a strong constraint, or a point-source failure mode. The motivation system of a Swarm Relaxation system would not contain “decision rules” of that sort, precisely because they could have large, divergent effects on the behavior. If motivation is, instead, governed by large numbers of weak constraints, and in this case your decision rule would be seen to be a type of deliberate deception, or manipulation, of the humans. And that contradicts a vast array of constraints that are consistent with friendliness.
        
        Again, it’s quite plausible that the AI’s concept space will contain some kind of concept that distinguishes between these different types of optimization; however, humans will need to understand the AI’s concept space in order to pinpoint this concept so it can be integrated into the AI’s decision rule.
        
        Same as previous: with a design that does not use decision rules that are prone to point-source failure modes, the issue evaporates.
        
        To summarize: much depends on an understanding of the concept of a weak constraint system. There are no really good readings I can send you (I know I should write one), but you can take a look at the introductory chapter of McClelland and Rumelhart that I gave in the references to the paper.
        
        Also, there is a more recent reference to this concept, from an unexpected source. Yann LeCun has been giving some lectures on Deep Learning in which he came up with a phrase that could have been used two decades ago to describe exactly the sort of behavior to be expected from SA systems. He titles his lecture “The Unreasonable Effectiveness of Deep Learning”. That is a wonderful way to express it: swarm relaxation systems do not have to work (there really is no math that can tell you that they should be as good as they are), but they do. They are “unreasonably effective”.
        
        There is a very deep truth buried in that phrase, and a lot of what I have to say about SA is encapsulated in it.
        jessicat 7 May 2015 8:27 UTC
        6 points
        Parent
        Okay, thanks a lot for the detailed response. I’ll explain a bit about where I’m coming from with understading the concept learning problem:
        
        I typically think of concepts as probabilistic programs eventually bottoming out in sense data. So we have some “language” with a “library” of concepts (probabilistic generative models) that can be combined to create new concepts, and combinations of concepts are used to explain complex sensory data (for example, we might compose different generative models at different levels to explain a picture of a scene). We can (in theory) use probabilistic program induction to have uncertainty about how different concepts are combined. This seems like a type of swarm relaxation, due to probabilistic constraints being fuzzy. I briefly skimmed through the McClellard chapter and it seems to mesh well with my understanding of probabilistic programming.
        But, when thinking about how to create friendly AI, I typically use the very conservative assumptions of statistical learning theory, which give us guarantees against certain kinds of overfitting but no guarantee of proper behavior on novel edge cases. Statistical learning theory is certainly too pessimistic, but there isn’t any less pessimistic model for what concepts we expect to learn that I trust. While the view of concepts as probabilistic programs in the previous bullet point implies properties of the system other than those implied by statistical learning theory, I don’t actually have good formal models of these, so I end up using statistical learning theory.
        
        I do think that figuring out if we can get more optimistic (but still justified) assumptions is good. You mention empirical experience with swarm relaxation as a possible way of gaining confidence that it is learning concepts correctly. Now that I think about it, bad handling of novel edge cases might be a form of “meta-overfitting”, and perhaps we can gain confidence in a system’s ability to deal with context shifts by having it go through a series of context shifts well without overfitting. This is the sort of thing that might work, and more research into whether it does is valuable, but it still seems worth preparing for the case where it doesn’t.
        
        Anyway, thanks for giving me some good things to think about. I think I see how a lot of our disagreements mostly come down to how much convergence we expect from different concept learning systems. For example, if “psychological manipulation” is in some sense a natural category, then of course it can be added as a weak (or even strong) constraint on the system.
        I’ll probably think about this a lot more and eventually write up something explaining reasons why we might or might not expect to get convergent concepts from different systems, and the degree to which this changes based on how value-laden a concept is.
        
        There is a lot of talk that can be given about how that complex union takes place, but here is one very important takeaway: it can always be made to happen in such a way that there will not, in the future, be any Gotcha cases (those where you thought you did completely merge the two concepts, but where you suddenly find a peculiar situation where you got it disastrously wrong). The reason why you won’t get any Gotcha cases is that the concepts are defined by large numbers of weak constraints, and no strong constraints—in such systems, the effect of smaller and smaller numbers of concepts can be guaranteed to converge to zero. (This happens for the same reason that the effect of smaller and smaller sub-populations of the molecules in a gas will converge to zero as the population sizes go to zero).
        
        I didn’t really understand a lot of what you said here. My current model is something like “if a concept is defined by lots of weak constraints, then lots of these constraints have to go wrong at once for the concept to go wrong, and we think this is unlikely due to induction and some kind of independence/uncorrelatedness assumption”; is this correct? If this is the right understanding, I think I have low confidence that errors in each weak constraint are in fact not strongly correlated with each other.
        [deleted] 7 May 2015 14:57 UTC
        10 points
        Parent
        I think you have homed in exactly on the place where the disagreement is located. I am glad we got here so quickly (it usually takes a very long time, where it happens at all).
        
        Yes, it is the fact that “weak constraint” systems have (supposedly) the property that they are making the greatest possible attempt to find a state of mutual consistency among the concepts, that leads to the very different conclusions that I come to, versus the conclusions that seem to inhere in logical approaches to AGI. There really is no underestimating the drastic difference between these two perspectives: this is not just a matter of two possible mechanisms, it is much more like a clash of paradigms (if you’ll forgive a cliche that I know some people absolutely abhor).
        
        One way to summarize the difference is by imagining a sequence of AI designs, with progressive increases in sophistication. At the beginning, the representation of concepts is simple, the truth values are just T and F, and the rules for generating new theorems from the axioms are simple and rigid.
        
        As the designs get better various new features are introduced … but one way to look at the progression of features is that constraints between elements of the system get more widespread, and more subtle in nature, as the types of AI become better and better.
        
        An almost trivial example of what I mean: when someone builds a real-time reasoning engine in which there has to be a strict curtailment of the time spent doing certain types of searches in the knowledge base, a wise AI programmer will insert some sanity checks that kick in after the search has to be curtailed. The sanity checks are a kind of linkage from the inference being examined, to the rest of the knowledge that the system has, to see if the truncated reasoning left the system in a state where it concluded something that is patently stupid. These sanity checks are almost always extramural to the logical process—for which read: they are glorified kludges—but in a real world system they are absolutely vital. Now, from my point of view what these sanity checks do is act as weak constraints on one little episode in the behavior of the system.
        
        Okay, so if you buy my suggestion that in practice AI systems become better, the more that they allow the little reasoning episodes to be connected to the rest of system by weak constraints, then I would like to go one step further and propose the following:
        
        1) As a matter of fact, you can build AI systems (or, parts of AI systems) that take the whole “let’s connect everything up with weak constraints” idea to an extreme, throwing away almost everything else (all the logic!) and keeping only the huge population of constraints, and something amazing happens: the system works better that way. (An old classic example, but one which still has lessons to teach, is the very crude Interactive Activation model of word recognition. Seen in its historical context it was a bombshell, because it dumped all the procedural programming that people had thought was necessary to do word recognition from features, and replaced it with nothing-but-weak-constraints …. and it worked better than any procedural program was able to do.)
        
        2) This extreme attitude to the power of weak constraints comes with a price: you CANNOT have mathematical assurances or guarantees of correct behavior. Your new weak-constraint system might actually be infinitely more reliable and stable than any of the systems you could build, where there is a possibility to get some kind of mathematical guarantees of correctness or convergence, but you might never be able to prove that fact (except with some general talk about the properties of ensembles).
        
        All of that is what is buried in the phrase I stole from Yann LeCun: the “unreasonable effectiveness” idea. These systems are unreasonably good at doing what they do. They shouldn’t be so good. But they are.
        
        As you can imagine, this is such a huge departure from the traditional way of thinking in AI, that many people find it completely alien. Believe it or not, I know people who seem willing to go to any lengths to destroy the credibility of someone who suggests the idea that mathematical rigor might be a bad thing in AI, or that there are ways of doing AI that are better than the status quo, but which involve downgrading the role of mathematics to just technical-support level, rather than primacy.
        
        --
        
        On your last question, I should say that I was only referring to the fact that in systems of weak constraints, there is extreme independence between the constraints, and they are all relatively small, so it is hard for an extremely inconsistent ‘belief’ or ‘fact’ to survive without being corrected. This is all about the idea of “single point of failure” and its antithesis.
        [deleted] 7 May 2015 15:06 UTC
        3 points
        Parent
        
        I briefly skimmed through the McClellard chapter and it seems to mesh well with my understanding of probabilistic programming.
        
        I think it would not go amiss to read Vikash Masinghka’s PhD thesis and the open-world generation paper to see a helpful probabilistic programming approach to these issues. In summary: we can use probabilistic programming to learn the models we need, use conditioning/query to condition the models on the constraints we intend to enforce, and then sample the resulting distributions to generate “actions” which are very likely to be “good enough” and very unlikely to be “bad”. We sample instead of inferring the maximum-a-posteriori action or expected action precisely because as part of the Bayesian modelling process we assume that the peak of our probability density does not necessary correspond to an in-the-world optimum.
        jessicat 7 May 2015 17:24 UTC
        1 point
        Parent
        I agree that choosing an action randomly (with higher probability for good actions) is a good way to create a fuzzy satisficer. Do you have any insights into how to:
        
        create queries for planning that don’t suffer from “wishful thinking”, with or without nested queries. Basically the problem is that if I want an action conditioned on receiving a high utility (e.g. we have a factor on the expected utility node U equal to e^(alpha * U) ), then we are likely to choose high-variance actions while inferring that the rest of the model works out such that these actions return high utilities
        
        extend this to sequential planning without nested nested nested nested nested nested queries
        
        Gunnar_Zarncke 6 May 2015 22:28 UTC
        0 points
        Parent
        That concept spaces can be matched without gotchas is reassuring and may point into a direction AGI can be made friendly. If the concepts are suitably matched in your proposed checking modules. If. And if no other errors are made.
      - Kaj_Sotala 9 May 2015 12:20 UTC
        3 points
        Parent
        Re: concepts, I’d be curious to hear any thoughts you might have on any part of my concept safety posts.
        [deleted] 10 May 2015 20:02 UTC
        3 points
        Parent
        That’s a lot of stuff to read (apologies: my bandwidth is limited at the moment) but my first response on taking a quick glance through is that you mention reinforcement learning an awful lot …… and RL is just a disaster.
        
        I absolutely do not accept the supposed “neuroscience” evidence that the brain uses RL. If you look into that evidence in detail, it turns out to be flimsy. There are two criticisms. First, virtually any circuit can be made to look like it has RL in it, if there is just a bit of feedback and some adaptation—so in that sense finding evidence for RL in some circuit is like saying “we found a bit of feedback and some adaptation”, which is a trivial result.
        
        The second criticism of RL is that the original idea was that it operated at a high level in the system design. Finding RL features buried in the low level circuit behavior does not imply that it is present in any form whatsoever, in the high level design—e.g. at the concept level. This is for the same reason that we do not deduce, from the fact that computer circuits only use zeros and ones at the lowest level, that therefore they can only make statements about arithmetic if those statements contain only zeros and ones.
        
        The net effect of these two observations, taken with the historical bankruptcy of RL in the psychology context, means that any attempt to use it in discussions of concepts, nowadays, seems empty.
        
        I know that only addresses a tiny fraction of what you said, but at this point I am worried, you see: I do not know how much the reliance on RL will have contaminated the rest of what you have to say …....
        Kaj_Sotala 10 May 2015 20:30 UTC
        6 points
        Parent
        Thanks. You are right, I do rely on an RL assumption quite a lot, and it’s true that it has probably “contaminated” most of the ideas: if I were to abandon that assumption, I’d have to re-evaluate all of the ideas.
        
        I admit that I haven’t dug very deeply into the neuroscience work documenting the brain using RL, so I don’t know to what extent the data really is flimsy. That said, I would be quite surprised if the brain didn’t rely strongly on RL. After all, RL is the theory of how an agent should operate in an initially unknown environment where the rewards and punishments have to be learned… which is very much the thing that the brain does.
        
        Another thing that makes me assign confidence to the brain using RL principles is that I (and other people) have observed in people a wide range of peculiar behaviors that would make perfect sense if most of our behavior was really driven by RL principles. It would take me too long to properly elaborate on that, but basically it looks to me strongly like things like this would have a much bigger impact on our behavior than any amount of verbal-level thinking about what would be the most reasonable thing to do.
        [deleted] 12 May 2015 14:57 UTC
        5 points
        Parent
        I don’t disagree with the general drift here. Not at all.
        
        The place where I have issues is actually a little subtle (though not too much so). If RL appears in a watered-down form all over the cognitive system, as an aspect of the design, so to speak, this would be entirely consistent with all the stuff that you observe, and which I (more or less) agree with.
        
        But where things get crazy is when it is seen as the core principle, or main architectural feature of the system. I made some attempts to express this in the earliest blog post on my site, but the basic story is that IF it is proposed as MAIN mechanism, all hell breaks loose. The reason is that for it to be a main mechanism it needs supporting machinery to find the salient stimuli, find plausible (salient) candidate responses, and it needs to package the connection between these in a diabolically simplistic scalar (S-R contingencies), rather in some high-bandwidth structural relation. If you then try to make this work, a bizarre situation arises: so much work has to be done by all the supporting machinery, that it starts to look totally insane to insist that there is a tiny, insignificant little S-R loop at the center of it all!
        
        That, really, is why behaviorism died in psychology. It was ludicrous to pretend that the supporting machinery was trivial. It wasn’t. And when people shifted their focus and started looking at the supporting machinery, they came up with …… all of modern cognitive psychology! The idea of RL just became irrelevant, and it shriveled away.
        
        There is a whole book’s worth of substance in what happened back then, but I am not sure anyone can be bothered to write it, because all the cogn psych folks just want to get on with real science rather than document the dead theory that wasn’t working. Pity, because AI people need to read that nonexistent book.
        Kaj_Sotala 12 May 2015 15:42 UTC
        2 points
        Parent
        Okay. In that case I think we agree. Like I mentioned in my reply to ChristianKI, I do feel that RL is an important mechanism to understand, but I definitely don’t think that you could achieve a very good understanding of the brain if you only understood RL. Necessary but not sufficient, as the saying goes.
        
        Any RL system that we want to do something non-trivial needs to be able to apply the things it has learned in one state to other similar states, which in turn requires some very advanced learning algorithms to correctly recognize “similar” states. (I believe that’s part of the “supporting machinery” you referred to.) Having just the RL component doesn’t get you anywhere near intelligence by itself.
        ChristianKl 10 May 2015 23:02 UTC
        0 points
        Parent
        
        It would take me too long to properly elaborate on that, but basically it looks to me strongly like things like this would have a much bigger impact on our behavior than any amount of verbal-level thinking about what would be the most reasonable thing to do.
        
        That seems to me like an argument from lack of imagination. The fact that reinforcement learning is the best among those you can easily imagine doesn’t mean that it’s the best overall.
        
        If reinforcement learning would be the prime way we learn, understanding Anki cards before you memorize them shouldn’t be as important as it is. Having a card fail after 5 repetitions because the initial understanding wasn’t deep enough to build a foundation suggests that learning is about more than just reinforcing. Creating the initial strong understanding of a card doesn’t feel to me like it’s about reinforcement learning.
        
        On a theoretical level reinforcement learning is basically behaviorism. It’s not like behaviorism never works but modern cognitive behavior therapy moved beyond it. CBT does things that aren’t well explainable with behaviorism.
        
        You can get rid of a phobia via reinforcement learning but it takes a lot of time and gradual change. There are various published principles that are simply faster.
        
        Pigeons manage to beat humans at a monty hall problem: http://www.livescience.com/6150-pigeons-beat-humans-solving-monty-hall-problem.html The pigeons engage the problem with reinforcement learning which is in this case a good strategy. Human on the other hand don’t use that strategy and get different outcomes. To me that suggest a lot of high level human thought is not about reinforcement learning.
        
        Given our bigger brains we should be able to beat the pigeons or at least be as good as them when we would use the same strategy.
        Kaj_Sotala 11 May 2015 8:06 UTC
        2 points
        Parent
        Oh, I definitely don’t think that human learning would only rely on RL, or that RL would be the One Grand Theory Explaining Everything About Learning. (Human learning is way too complicated for any such single theory.) I agree that e.g. the Anki card example you mention requires more blocks to explain than RL.
        
        That said, RL would help explain things like why many people’s efforts to study via Anki so easily fail, and why it’s important to make each card contain as little to recall as possible—the easier it is to recall the contents of a card, the better the effort/reward ratio, and the more likely that you’ll remain motivated to continue studying the cards.
        
        You also mention CBT. One of the basic building blocks of CBT is the ABC model, where an Activating Event is interpreted via a subconscious Belief, leading to an emotional Consequence. Where do those subconscious Beliefs come from? The full picture is quite complicated (see appraisal theory, the more theoretical and detailed version of the ABC model), but I would argue that at least some of the beliefs look like they could be produced by something like RL.
        
        As a simple example, someone once tried to rob me at a particular location, after which I started being afraid of taking the path leading through that location. The ABC model would describe this as saying that the Activating event is (the thought of) that location, the Belief is that that location is dangerous, and the Consequence of that belief is fear and a desire to avoid that location… or, almost equivalently, you could describe that as a RL process having once received a negative reward at that particular location, and therefore assigning a negative value to that location since that time.
        
        That said, I did reason that even though it had happened once, I’d just been unlucky on that time and I knew on other grounds that that location was just as safe as any other. So I forced myself to take that path anyway, and eventually the fear vanished. So you’re definitely right that we also have brain mechanisms that can sometimes override the judgments produced by the RL process. But I expect that even their behavior is strongly shaped by RL elements… e.g. if I had tried to make myself walk that path several times and failed on each time, I would soon have acquired the additional Belief that trying to overcome that fear is useless, and given up.
        [deleted] 12 May 2015 16:19 UTC
        6 points
        Parent
        I think it is very important to consider the difference between a descriptive model and a theory of a mechanism.
        
        So, inventing an extreme example for purposes of illustration, if someone builds a simple, two-parameter model of human marital relationships (perhaps centered on the idea of cost and benefits), that model might actually be made to work, to a degree. It could be used to do some pretty simple calculations about how many people divorce, at certain income levels, or with certain differences in income between partners in a marriage.
        
        But nobody pretends that the mechanism inside the descriptive model corresponds to an actual mechanism inside the heads of those married couples. Sure, there might be!, but there doesn’t have to be, and we are pretty sure there is no actual calculation inside a particular mechanism, that matches the calculation in the model. Rather, we believe that reality involves a much more complex mechanism that has that behavior as an emergent property.
        
        When RL is seen as a descriptive model—which I think is the correct way to view it in your above example, that is fine and good as far as it goes.
        
        The big trouble that I have been fighting is the apotheosis from descriptive model to theory of a mechanism. And since we are constructing mechanisms when we do AI, that is an especially huge danger that must be avoided.
        Kaj_Sotala 14 May 2015 21:26 UTC
        2 points
        Parent
        I agree that this is an important distinction, and that things that might naively seem like mechanisms are often actually closer to descriptive models.
        
        I’m not convinced that RL necessarily falls into the class of things that should be viewed mainly as descriptive models, however. For one, what’s possibly the most general-purpose AI developed so far seems to have been developed by explicitly having RL as an actual mechanism. That seems to me like a moderate data point towards RL being an actual useful mechanism and not just a description.
        
        Though I do admit that this isn’t necessarily that strong of a data point—after all, SHRDLU was once the most advanced system of its time too, yet basically all of its mechanisms turned out to be useless.
        Expand this thread
        [deleted] 15 May 2015 19:20 UTC
        0 points
        Parent
        Arrgghh! No. :-)
        
        The DeepMind Atari agent is the “most general-purposeAI developed so far”?
        
        !!!
        
        At this point your reply is “I am not joking. And don’t call me Shirley.”
        ChristianKl 11 May 2015 12:16 UTC
        0 points
        Parent
        
        So I forced myself to take that path anyway, and eventually the fear vanished.
        
        The fact that you don’t consciously notice fear doesn’t mean that it’s completely gone. It still might raise your pulse a bit. Physiological responses in general stay longer.
        
        To the extend that you removed the fear In that case I do agree doing exposure therapy is drive by RL. On the other hand it’s slow.
        
        I don’t think you need a belief to have a working Pavlonian trigger. When playing around with anchoring in NLP I don’t think that a physical anchor is well described as working via a belief. Beliefs seem to me separate entities. They usually exist as “language”/semantics.
        Kaj_Sotala 11 May 2015 13:17 UTC
        0 points
        Parent
        
        When playing around with anchoring in NLP I don’t think that a physical anchor is well described as working via a belief. Beliefs seem to me separate entities. They usually exist as “language”/semantics.
        
        I’m not familiar with NLP, so I can’t comment on this.
        Expand this thread
        ChristianKl 11 May 2015 23:17 UTC
        0 points
        Parent
        Do you have experience with other process oriented change work techniques? Be it alternative frameworks or CBT?
        
        I think it’s very hard to reason about concepts like beliefs. We have a naive understanding of what the word means but there are a bunch of interlinked mental modules that don’t really correspond to naive language. Unfortunately they are also not easy to study apart from each other.
        
        Having reference experiences of various corner cases seems to me to be required to get to grips with concepts.
        Kaj_Sotala 12 May 2015 15:51 UTC
        0 points
        Parent
        
        Do you have experience with other process oriented change work techniques?
        
        Not sure to what extent these count, but I’ve done various CFAR techniques, mindfulness meditation, and Non-Violent Communication (which I’ve noticed is useful not only for improving your communication, but also dissolving your own annoyances and frustrations even in private).
        ChristianKl 12 May 2015 16:07 UTC
        0 points
        Parent
        Do you think that resolving an emotion frustration via NVC is done via reinforcement learning?
        Kaj_Sotala 12 May 2015 18:11 UTC
        0 points
        Parent
        No.
        Richard_Kennaway 11 May 2015 12:29 UTC
        1 point
        Parent
        
        The pigeons engage the problem with reinforcement learning
        
        How do you know? When a scientist rewards pigeons for learning, the fact that the pigeons learn doesn’t prove anything about how the pigeons are doing it.
        ChristianKl 11 May 2015 12:37 UTC
        0 points
        Parent
        Of course they are a black box and could in theory use a different method. On the other hand their choices are comparable with the ones that an RL algorithm would make while the ones of the humans are father apart.
        Richard_Kennaway 11 May 2015 12:47 UTC
        0 points
        Parent
        I agree with Richard Loosemore’s interpretation (but I am not familiar with the neuroscience he is referring to):
        
        First, virtually any circuit can be made to look like it has RL in it, if there is just a bit of feedback and some adaptation—so in that sense finding evidence for RL in some circuit is like saying “we found a bit of feedback and some adaptation”, which is a trivial result.
        
        Expand this thread
        ChristianKl 11 May 2015 12:54 UTC
        0 points
        Parent
        The main point that I wanted to make wasn’t about Pigeon intelligence but that the heuristics humans use differ from RL results and that in cases like this the Pigeons produce results that are similar to RL and therefore it’s not a problem of cognitive resources.
        
        The difference tells us something worthwhile about human reasoning.
        AshwinV 11 May 2015 4:13 UTC
        0 points
        Parent
        Uhm. Is there any known experiment that has been tried which has failed with respect to RL?
        
        In the sense, has there been an experiment where one says RL should predict X, but X did not happen. The lack of such a conclusive experiment would be somewhat evidence in favor of RL. Provided of course that the lack of such an experiment is not due to other reasons such as inability to design a proper test (indicating a lack of understanding of the properties of RL) or lack of the experiment happening to due to real world impracticalities (not enough attention having been cast on RL, not enough funding for a proper experiment to have been conducted etc.)
        ChristianKl 11 May 2015 12:03 UTC
        0 points
        Parent
        In general scientists do a lot of experiments where they make predictions about learning and those predictions turn out to be false. That goes for predictions based on RL as well as prediction based on other models.
        
        Wikipedia describes RL as:
        
        Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
        
        Given that’s an area of machine learning you usually don’t find psychologists talking about RL. They talk about behaviorism. There are tons of papers published on behaviorism and after a while the cognitive revolution came along and most psychologists moved beyond RL.
        Kaj_Sotala 11 May 2015 13:35 UTC
        0 points
        Parent
        
        Given that’s an area of machine learning you usually don’t find psychologists talking about RL. They talk about behaviorism.
        
        Not quite true, especially not if you count neuroscientists as psychologists. There have been quite a few papers by psychologists and neuroscientists talking about reinforcement learning in the last few years alone.
        Wes_W 11 May 2015 4:34 UTC
        0 points
        Parent
        It appears to me that ChristianKI just listed four. Did you have something specific in mind?
        AshwinV 11 May 2015 5:09 UTC
        2 points
        Parent
        Uhm, I kind of felt the pigeon experiment was a little misleading.
        
        Yes, the pigeons did a great job of switching doors and learning through LR.
        
        Human RL however (seems to me) takes place in a more subtle manner. While the pigeons seemed to focus on a more object level prouctivity, human RL would seem to take up a more complicated route.
        
        But even that’s kind of besides the point.
        
        In the article that Kaj had posted above, with the Amy Sutherland trying the LRS on her husband, it was an interesting point to note that the RL was happening at a rather unconscious level. In the monty hall problem solving type of cognition, the brain is working at a much more conscious active level.
        
        So it seems more than likely to me that while LR works in humans, it gets easily over-ridden if you will by conscious deliberate action.
        
        One other point is also worth noting in my opinion.
        
        Human brains come with a lot more baggage than pigeon brains. Therefore, it is more than likely than humans have learnt not to switch through years of re-enforced learning. It makes it much harder to unlearn the same thing in a smaller period of time. The pigeons having lesser cognitive load may have a lot less to unlearn and may have made it easier for them to learn the switching pattern.
        Expand this thread
        AshwinV 11 May 2015 5:15 UTC
        0 points
        Parent
        Also, I just realised that I didn’t quite answer your question. Sorry about that I got carried away in my argument.
        
        But the answer is no, I don’t have anything specific in mind. Also, I don’t know enough about things like what effects RL will have on memory, preferences etc. But I kind of feel that I could design an experiment if I knew more about it.
      - Gunnar_Zarncke 10 May 2015 23:25 UTC
        1 point
        Parent
        
        If concepts are defined by large clusters of constraints between concepts [...] then you can always deal with situations in which two concepts seem near to one another but do not properly overlap.
        
        Am I correct that this refers to topological convergence results like those in section 2.8 in this ref?: http://www.ma.utexas.edu/users/arbogast/appMath08c.pdf
        [deleted] 12 May 2015 14:23 UTC
        5 points
        Parent
        I confess that it is would take me some time to establish whether weak constraint systems of the sort I have in mind can be mapped onto normed linear spaces. I suspect not: this is more the business of partial orderings than it is topological spaces.
        
        To clarify what I was meaning in the above: if concept A is defined by a set A of weak constraints that are defined over the set of concepts, and another concept B has a similar B, where B and A have substantial overlap, one can introduce new concepts that sit above the differences and act as translation concepts, with the result that eventually you can find a single concept Z that allows A and B to be seen as special cases of Z.
        
        All of this is made less tractable because the weak constraints (1) do not have to be pairwise (although most of them probably will be), and (2) can belong to different classes, with different properties associated with them (so, the constraints themselves are not just links, they can have structure). It is for these reasons that I doubt whether this could easily be made to map onto theorems from topology.
        Gunnar_Zarncke 12 May 2015 14:51 UTC
        1 point
        Parent
        Thanks for your answer. I trust your knowledge. I just want to read up on the math behind that.
        [deleted] 13 May 2015 16:38 UTC
        5 points
        Parent
        Actually it turns out that my knowledge was a little rusty on one point, because apparently the topic of orderings and lattice theory are considered a sub branch of general topology.
        
        Small point, but I wanted to correct myself.
        Gunnar_Zarncke 13 May 2015 17:42 UTC
        1 point
        Parent
        Hm. Does that mean that my reference is the right one? I’m explicitly asking because I still can’t reliably map your terminolgy (‘concept’, ‘translation’) to topological terms.
        [deleted] 13 May 2015 19:30 UTC
        2 points
        Parent
        Oh no, that wasn’t where I was going. I was just making a small correction to something I said about orderings vs. topology. Not important.
        
        The larger problem stands: concepts are active entities (for which read: they have structure, and they are adaptive, and their properties depend on mechanisms inside, with which they interact with other concepts). Some people use the word ‘concept’ to denote something very much simpler than that (a point in concept space, with perhaps a definable measure of distance to other concepts). If my usage were close to the latter, you might get some traction from using topology. But that really isn’t remotely true, so I do not think there is any way to make use of topology here.
        Gunnar_Zarncke 13 May 2015 21:18 UTC
        0 points
        Parent
        I think I recognize what you mean from something I wrote in 2007 about the vaguesness of concepts:
        
        http://web.archive.org/web/20120121185331/http://grault.net/adjunct/index.cgi?VaguesDependingOnVagues (note the wayback-link; the original site no longer exists).
        
        But your reply still doesn’t answer my question: You claim that the concepts are stable and that a “o gotcha” result can be proven—and I assume mathematically proven. And for that I’d really like to see a reference to the relevant math as I want to integrate that into my own understanding of concepts that are ‘composed’ from vague features.
        Expand this thread
        [deleted] 13 May 2015 23:36 UTC
        2 points
        Parent
        Yes to your link. And Hofstadter, of course, riffs on this idea continuously.
        
        (It is fun, btw, to try to invent games in which ‘concepts’ are defined by more and more exotic requirements, then watch the mind as it gets used to the requirements and starts supplying you with instances).
        
        When I was saying mathematically proven, this is something I am still working on, but cannot get there yet (otherwise I would have published it already) because it involves being more specific about the relevant classes of concept mechanism. When the proof comes it will be a statistical-mechanics-style proof, however.
        Gunnar_Zarncke 14 May 2015 6:20 UTC
        1 point
        Parent
        OK. Now I understand what kind of proof you mean. Thank you for you answer and your passion. Also thanks for the feedback on my old post.
      - [deleted] 7 May 2015 14:52 UTC
        1 point
        Parent
        
        The first thing to say about this, is that it is a conceptual minefield. The semantics (or ontological grounding) of AI systems is, in my opinion, one of the least-well developed parts of the whole field. People often pay lip-service to some kind of model-theoretical justification for an AI’s semantic foundations, but in practice this actually means very little, since the theoretical ideas shade off into philosophy, have some huge unresolved gaps in them, and frequently take recourse in infinitely large (i.e. uncomputable) mappings between sets of ‘possible worlds’. Worst of all, the area is rife with question-begging (like using technical vocabulary which itself has a poorly defined semantics to try to specify exactly what ‘semantics’ is!).
        
        Why does that matter? Because many of the statements that people make about semantic issues (like the alien semantics problem) are predicated on precisely which semantic theory they subscribe to. And, it is usually the case that their chosen semantic theory is just a vague idea that goes somewhat in the direction of Tarski, or in the direction of Montague, or maybe just what they read in Russell and Norvig. The problem is that those semantic theories have challengers (some of them not very well defined, but even so...), such as Cognitive Semantics, and those other semantic formalisms have a truly gigantic impact on some of the issues we are discussing here.
        
        So, for example, there is an interpretation of semantics that says that it is not even coherent to talk about two concept landscapes that are semantic aliens. To be sure, this can happen in language—things expressible in one language can be very hard to say in another language—but the idea that two concept spaces can be in some way irreconcilable, or untranslatable, would be incoherent (not “unlikely” but actually not possible).
        
        Ah! Finally a tasty piece of real discussion! I’ve got a biiig question about this: how do these various semantic theories for AI/AGI take into account the statistical nature of real cognition?
        
        (Also, I’m kicking myself for not finishing Plato’s Camera yet, because now I’m desperately wanting to reference it.)
        
        Basically: in real cognition, semantics are gained from the statistical relationship between a model of some sort, and feature data. There can be multiple “data-types” of feature data: one of the prominent features of the human brain is that once a concept is learned, it becomes more than a sum of training data, but instead a map of a purely abstract, high-dimensional feature space (or, if you prefer, a distribution over that feature space), with the emphasis being on the word abstract. The dimensions of that space are usually not feature data, but parameters of an abstract causal model, inferrable from feature data. This makes our real concepts accessible through completely different sensory modalities.
        
        Given all this knowledge about how the real brain works, and given that we definitively need AGI/FAI/whatever to work at least as well if not, preferably, better than the real human brain… how do semantic theories in the AI/AGI field fit in with all this statistics? How do you turn statistics into model-theoretic semantics of a formal logic system?
        [deleted] 7 May 2015 15:43 UTC
        6 points
        Parent
        Ack, I wish people didn’t ask such infernally good questions, so much! ;-)
        
        Your question is good, but the answer is not really going to satisfy. There is an entire book on this subject, detailing the relationship between purely abstract linguistics-oriented theories of semantics, the more abstractly mathematical theories of semantics, the philosophical approach (which isn’t called “semantics” of course: that is epistemology), and the various (rather weak and hand-wavy) ideas that float around in AI. One thing it makes a big deal of is the old (but still alive) chestnut of the Grounding Problem.
        
        The book pulls all of these things together and analyzes them in the context of the semantics that is actually used by the only real thinking systems on the planet right now (at least, the only ones that want to talk about semantics), and then it derives conclusions and recommendations for how all of that can be made to knit together.
        
        Yup, you’ve guessed it.
        
        That book doesn’t exist. There is not (in my opinion anyway) anything that even remotely comes close to it.
        
        What you said about the statistical nature of real cognition would be considered, in cognitive psychology, as just one persepective on the issue: alas there are many.
        
        At this point in time I can only say that in my despair at the hugeness of this issue leaves me with nothing much more to say except that I am trying to write that book, but I might never get around to it. And in the mean time I can only try, for my part, to write some answers to more specific questions within that larger whole.
        [deleted] 7 May 2015 18:12 UTC
        3 points
        Parent
        Ok, let me continue to ask questions.
        
        How do the statistically-oriented theories of pragmatics and the linguistic theories of semantics go together?
        
        Math semantics, in the denotational and operational senses, I kinda understand: you demonstrate the semantics of a mathematical system by providing some outside mathematical object which models it. This also works for CS semantics, but does come with the notion that we include \Bot as an element of our denotational domains and that our semantics may bottom out in “the machine does things”, ie: translation to opcodes.
        
        The philosophical approach seems to wave words around like they’re not talking about how to make words mean things, or go reference the mathematical approach. I again wish to reference Plato’s Camera, and go with Domain Portrayal Semantics. That at least gives us a good guess to talk about how and why symbol grounding makes sense, as a feature of cognition that must necessarily happen in order for a mind to work.
        
        What you said about the statistical nature of real cognition would be considered, in cognitive psychology, as just one persepective on the issue: alas there are many.
        
        Nonetheless, it is considered one of the better-supported hypotheses in cognitive science and theoretical neuroscience.
        
        At this point in time I can only say that in my despair at the hugeness of this issue leaves me with nothing much more to say except that I am trying to write that book, but I might never get around to it. And in the mean time I can only try, for my part, to write some answers to more specific questions within that larger whole.
        
        Fair enough.
        [deleted] 7 May 2015 19:34 UTC
        6 points
        Parent
        There are really two aspects to semantics: grounding and compositionality. Elementary distinction, of course, but with some hidden subtlety to it … because many texts focus on one of them and do a quick wave of the hand at the other (it is usually the grounding aspect that gets short shrift, while the compositionality aspect takes center stage).
        
        [Quick review for those who might need it: grounding is the question of how (among other things) the basic terms of your language or concept-encoding system map onto “things in the world”, whereas compositionality is how it is that combinations of basic terms/concepts can ‘mean’ something in such a way that the meaning of a combination can be derived from the meaning of the constituents plus the arrangement of the constituents.]
        
        So, having said that, a few observations.
        
        Denotational and operational semantics of programming languages or formal systems ….. well, there we have a bit of a closed universe, no? And things get awfully (deceptively) easy when we drop down into closed universes. (As Winograd and the other Blocks Worlds enthusiasts realized rather quickly). You hinted at that with your comment when you said:
        
        … and that our semantics may bottom out in “the machine does things”, ie: translation to opcodes.
        
        We can then jump straight from too simple to ridiculously abstract, finding ourselves listening to philosophical explanations of semantics, on which subject you said:
        
        The philosophical approach seems to wave words around like they’re not talking about how to make words mean things...
        
        Concisely put, and I am not sure I disagree (too much, at any rate).
        
        Then we can jump sideways to psychology (and I will lump neuroscientists/neurophilosophers like Patricia Churchland in with the psychologists). I haven’t read any of PC’s stuff for quite a while, but Plato’s Camera does look to be above-average quality so I might give it a try. However, looking at the link you supplied I was able to grok where she was coming from with Domain Portrayal Semantics, and I have to say that there are some problems with that. (She may deal with the problems later, I don’t know, say that the following as provisional.)
        
        Her idea of a Domain Portrayal Semantics is very static: just a state-space divide-and-conquer, really. The problem with that is that in real psychological contexts people often regard concepts as totally malleable in all sorts of ways. They shift the boundaries around over time, in different contexts, and with different attitudes. So, for example, I can take you into my workshop which is undergoing renovation at the moment and, holding in my hand a takeout meal for you and the other visitors, I can say “find some chairs, a lamp, and a dining table”. There are zero chairs, lamps and dining tables in the room. But, faced with the takeout that is getting cold, you look around and find (a) a railing sticking out of the wall, which becomes a chair because you can kinda sit on it, (b) a blowtorch that can supply light, and (c) a tea chest with a pile of stuff on it, from which the stuff can be removed to make a dining table. All of those things can be justifiably called chairs tables and lamps because of their functionality.
        
        I am sure her idea could be extended to allow for this kind of malleability, but the bottom line is that you then build your semantics on some very shifty sort of sand, not the rock that maybe everyone was hoping for.
        
        (I have to cut off this reply to go do a task. Hopefully get back to it later).
        [deleted] 7 May 2015 21:53 UTC
        3 points
        Parent
        Plato’s Camera is well above average for a philosophy-of-mind book, but I still think it focuses too thoroughly on relatively old knowledge about what we can do with artificial neural networks, both supervised and unsupervised. My Kindle copy includes angry notes to the effect of, “If you claim we can do linear transformations on vector-space ‘maps’ to check by finding a homomorphism when they portray the same objective feature-domain, how the hell can you handle Turing-complete domains!? The equivalence of lambda expressions is undecidable!”
        
        This is why I’m very much a fan of the probabilistic programming approach to computational cognitive science, which clears up these kinds of issues. In a probabilistic programming setting, the probability of extensional equality for two models (where models are distributions over computation traces) is a dead simple and utterly normal query: it’s just p(X == Y), where X and Y are taken to be models (aka: thunk lambdas, aka: distributions from which we can sample). The undecidable question is thus shunted aside in favor of a check that is merely computationally intensive, but can ultimately be done in a bounded-rational way.
        [deleted] 12 May 2015 14:47 UTC
        2 points
        Parent
        My reaction to those simple neural-net accounts of cognition is similar, in that I wanted very much to overcome their (pretty glaring) limitations. I wasn’t so much concerned with inability to handle Turing complete domains, as other more practical issues. But I came to a different conclusion about the value of probabilistic programming approaches, because that seems to force the real world to conform to the idealized world of a branch of mathematics, and, like Leonardo, I don’t like telling Nature what she should be doing with her designs. ;-)
        
        Under the heading of ‘interesting history’ it might be worth mentioning that I hit my first frustration with neural nets at the very time that it was bursting into full bloom—I was part of the revolution that shook cognitive science in the mid to late 1980s. Even while it was in full swing, I was already going beyond it. And I have continued on that path ever since. Tragically, the bulk of NN researchers stayed loyal to the very simplistic systems invented in the first blush of that spring, and never seemed to really understand that they had boxed themselves into a dead end.
        [deleted] 15 May 2015 4:20 UTC
        2 points
        Parent
        
        But I came to a different conclusion about the value of probabilistic programming approaches, because that seems to force the real world to conform to the idealized world of a branch of mathematics, and, like Leonardo, I don’t like telling Nature what she should be doing with her designs. ;-)
        
        Ah, but Nature’s elegant design for an embodied creature is precisely a bounded-Bayesian reasoner! You just minimize the free energy of the environment.
        
        And I have continued on that path ever since. Tragically, the bulk of NN researchers stayed loyal to the very simplistic systems invented in the first blush of that spring, and never seemed to really understand that they had boxed themselves into a dead end.
        
        Could you explain the kinds of neural networks beyond the standard feedforward, convolutional, and recurrent supervised networks? In particular, I’d really appreciating hearing a connectionist’s view on how unsupervised neural networks can learn to convert low-level sensory features into the kind of more abstracted, “objectified” (in the sense of “made objective”) features that can be used for the bottom, most concrete layer of causal modelling.
        Expand this thread
        [deleted] 15 May 2015 18:31 UTC
        3 points
        Parent
        
        Ah, but Nature’s elegant design for an embodied creature is precisely a bounded-Bayesian reasoner! You just minimize the free energy of the environment.
        
        Yikes! No. :-)
        
        That paper couldn’t be a more perfect example of what I meant when I said
        
        that seems to force the real world to conform to the idealized world of a branch of mathematics
        
        In other words, the paper talks about a theoretical entity which is a descriptive model (not a functional model) of one aspect of human decision making behavior. That means you cannot jump to the conclusion that this is “nature’s design for an embodide creature”.
        
        About your second question. I can only give you an overview, but the essential ingredient is that to go beyond the standard neural nets you need to consider neuron-like objects that are actually free to be created and destroyed like processes on a network, and which interact with one another using more elaborate, generalized versions of the rules that govern simple nets.
        
        From there it is easy to get to unsupervised concept building because the spontaneous activity of these atoms (my preferred term) involves searching for minimum-energy* configurations that describe the world.
        
        There is actually more than one type of ‘energy’ being simultaneously minimized in the systems I work on.
        
        You can read a few more hints of this stuff in my 2010 paper with Trevor Harley (which is actually on a different topic, but I threw in a sketch of the cognitive system for purposes of illustrating my point in that paper).
        
        Reference: Loosemore, R.P.W. & Harley, T.A. (2010). Brains and Minds: On the Usefulness of Localisation Data to Cognitive Psychology. In M. Bunzl & S.J. Hanson (Eds.), Foundational Issues of Neuroimaging. Cambridge, MA: MIT Press. http://richardloosemore.com/docs/2010a_BrainImaging_rpwl_tah.pdf
      - Gunnar_Zarncke 6 May 2015 22:25 UTC
        1 point
        Parent
        
        There is a lot of talk that can be given about how that complex union takes place, but here is one very important takeaway: it can always be made to happen in such a way that there will not, in the future, be any Gotcha cases (those where you thought you did completely merge the two concepts, but where you suddenly find a peculiar situation where you got it disastrously wrong). The reason why you won’t get any Gotcha cases is that the concepts are defined by large numbers of weak constraints, and no strong constraints—in such systems, the effect of smaller and smaller numbers of concepts can be guaranteed to converge to zero.
        
        That is an interesting aspect of one particular way to deal with the problem that I have not yet heard about and I’d like to see a reference for that to read up on it.
        [deleted] 7 May 2015 1:31 UTC
        10 points
        Parent
        I first started trying to explain, informally, how these types of systems could work back in 2005. The reception was so negative that it led to a nasty flame war.
        
        I have continued to work on these systems, but there is a problem with publishing too much detail about them. The very same mechanisms that make the motivation engine a safer type of beast (as described above) also make the main AGI mechanisms extremely powerful. That creates a dilemma: talk about the safety issues, and almost inevitably I have to talk about the powerful design. So, I have given some details in my published papers, but the design is largely under wraps, being developed as an AGI project, outside the glare of publicity.
        
        I am still trying to find ways to write a publishable paper about this class of systems, and when/if I do I will let everyone know about it. In the mean time, much of the core technology is already described in some of the references that you will find in my papers (including the one above). The McClelland and Rumelhart reference, in particular, talks about the fundamental ideas behind connectionist systems. There is also a good paper by Hofstadter called “Jumbo” which illustrates another simple system that operates with multiple weak constraints. Finally, I would recommend that you check out Geoff Hinton’s early work.
        
        In all you neural net reading, it is important to stay above the mathematical details and focus on the ideas, because the math is a distraction from the more important message.
        Gunnar_Zarncke 7 May 2015 22:58 UTC
        2 points
        Parent
        I have read McClelland and Rumelhart first ~20 years ago and it has a prominent place in my book shelf. I havn’t been able to actively work in AI but I have followed the field. I put some hopes in integrated connectionist symbolic systems and was rewarded with deep neural networks lately. I think that every advanced system will need some non-symbolic approach to integrate reality. I don’t know whether it will be NNs or some other statistical means. And the really tricky part will be to figure out how to pre-wire it such that it ‘does what it should’. I think a lot will be learned how the same is realized in the human brain.
    - [deleted] 7 May 2015 14:43 UTC
      3 points
      Parent
      
      Something like deep learning is likely to produce concepts that are very difficult for humans to understand, while probabilistic programming might produce more transparent models. How easy it is to make transparent AGI (compared to opaque AGI) is an open question.
      
      Maybe I’m biased as an open proponent of probabilistic programming, but I think the latter can make AGI at all, while the former not only would result in opaque AGI, but basically can’t result in a successful real-world AGI at all.
      
      I don’t think you can get away from the need to do hierarchical inference on complex models in Turing-complete domains (in short: something very like certain models expressible in probabilistic programming). A deep neural net is basically just drawing polygons in a hierarchy of feature spaces, and hoping your polygons have enough edges to approximate the shape you really mean but not so many edges that they take random noise in the training data to be part of the shape—given just the right conditions, it can approximate the right thing, but it can’t even describe how to do the right thing in general.
    - [deleted] 6 May 2015 2:36 UTC
      1 point
      Parent
      
      A sufficiently advanced AI’s concept space might contain a similar concept. But how do we pinpoint this concept in the AI’s concept space?
      
      Why does everyone suppose that there are a thousand different ways to learn concepts (ie: classifiers), but no normatively correct way for an AI to learn concepts? It seems strange to me that we think we can only work with a randomly selected concept-learning algorithm or the One Truly Human Concept-Learning Algorithm, but can’t say when the human is wrong.
      - jessicat 6 May 2015 3:29 UTC
        5 points
        Parent
        We can do something like list a bunch of examples, have humans label them, and then find the lowest Kolomogorov complexity concept that agrees with human judgments in, say, 90% of cases. I’m not sure if this is what you mean by “normatively correct”, but it seems like a plausible concept that multiple concept learning algorithms might converge on. I’m still not convinced that we can do this for many value-laden concepts we care about and end up with something matching CEV, partially due to complexity of value. Still, it’s probably worth systematically studying the extent to which this will give the right answers for non-value-laden concepts, and then see what can be done about value-laden concepts.
        [deleted] 6 May 2015 14:35 UTC
        4 points
        Parent
        
        We can do something like list a bunch of examples, have humans label them, and then find the lowest Kolomogorov complexity concept that agrees with human judgments in, say, 90% of cases.
        
        Regularization is already a part of training any good classifier.
        
        I’m not sure if this is what you mean by “normatively correct”, but it seems like a plausible concept that multiple concept learning algorithms might converge on.
        
        Roughly speaking, I mean optimizing for the causal-predictive success of a generative model, given not only a training set but a “level of abstraction” (something like tagging the training features with lower-level concepts, type-checking for feature data) and a “context” (ie: which assumptions are being conditioned-on when learning the model).
        
        Again, roughly speaking, humans tend to make pretty blatant categorization errors (ie: magical categories, non-natural hypotheses, etc.), but we also are doing causal modelling in the first place, so we accept fully-naturalized causal models as the correct way to handle concepts. However, we also handle reality on multiple levels of abstraction: we can think in chairs and raw materials and chemical treatments and molecular physics, all of which are entirely real. For something like FAI, I want a concept-learning algorithm that will look at the world in this naturalized, causal way (which is what normal modelling shoots for!), and that will model correctly at any level of abstraction or under any available set of features, and will be able to map between these levels as the human mind can.
        
        Basically, I want my “FAI” to be built out of algorithms that can dissolve questions and do other forms of conceptual analysis without turning Straw Vulcan and saying, “Because ‘goodness’ dissolves into these other things when I naturalize it, it can’t be real!”. Because once I get that kind of conceptual understanding, it really does get a lot closer to being a problem of just telling the agent to optimize for “goodness” and trusting its conceptual inference to work out what I mean by that.
        
        Sorry for rambling, but I think I need to do more cog-sci reading to clarify my own thoughts here.
        jessicat 7 May 2015 8:36 UTC
        3 points
        Parent
        
        Regularization is already a part of training any good classifier.
        
        A technical point here: we don’t learn a raw classifier, because that would just learn human judgments. In order to allow the system to disagree with a human, we need to use some metric other than “is simple and assigns high probability to human judgments”.
        
        For something like FAI, I want a concept-learning algorithm that will look at the world in this naturalized, causal way (which is what normal modelling shoots for!), and that will model correctly at any level of abstraction or under any available set of features, and will be able to map between these levels as the human mind can.
        
        I totally agree that a good understanding of multi-level models is important for understanding FAI concept spaces. I don’t have a good understanding of multi-level maps; we can definitely see them as useful constructs for bounded reasoners, but it seems difficult to integrate higher levels into the goal system without deciding things about the high-level map a priori so you can define goals relative to this.
        [deleted] 7 May 2015 15:15 UTC
        2 points
        Parent
        
        I don’t have a good understanding of multi-level maps; we can definitely see them as useful constructs for bounded reasoners
        
        Well, all real reasoners are bounded reasoners. If you just don’t care about computational time bounds, you can run the Ordered Optimal Problem Solver as the initial input program to a Goedel Machine, and out pops your AI (in 200 trillion years, of course)!
        
        it seems difficult to integrate higher levels into the goal system without deciding things about the high-level map a priori so you can define goals relative to this.
        
        I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind. Of course, I also tend to say that you should just use a debugged (ie: cured of systematic faults) model of human evaluative processes for your goal system, and then use actual human evaluations to train the free parameters, and then set up learning feedback from the learned concept of “human” to the free-parameter space of the evaluation model.
        jessicat 7 May 2015 17:18 UTC
        5 points
        Parent
        
        I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind.
        
        This seems like a sane thing to do. If this didn’t work, it would probably be because either
        
        lack of conceptual convergence and human understandability; this seems somewhat likely and is probably the most important unknown
        
        our conceptual representations are only efficient for talking about things we care about because we care about these things; a “neutral” standard such as resource-bounded Solomonoff induction will horribly learn things we care about for “no free lunch” reasons. I find this plausible but not too likely (it seems like it ought to be possible to “bootstrap” an importance metric for deciding where in the concept space to allocate resources).
        
        we need the system to have a goal system in order to self-improve to the point of creating this conceptual map. I find this a little likely (this is basically the question of whether we can create something that manages to self-improve without needing goals; it is related to low impact).
        
        Of course, I also tend to say that you should just use a debugged (ie: cured of systematic faults) model of human evaluative processes for your goal system, and then use actual human evaluations to train the free parameters, and then set up learning feedback from the learned concept of “human” to the free-parameter space of the evaluation model.
        
        I agree that this is a good idea. It seems like the main problem here is that we need some sort of “skeleton” of a normative human model whose parts can be filled in empirically, and which will infer the right goals after enough training.
        [deleted] 7 May 2015 15:11 UTC
        2 points
        Parent
        
        In order to allow the system to disagree with a human, we need to use some metric other than “is simple and assigns high probability to human judgments”.
        
        Right: and the metric I would propose is, “counterfactual-prediction power”. Or in other words, the power to predict well in a causal fashion, to be able to answer counterfactual questions or predict well when we deliberately vary the experimental conditions.
        
        To give a simple example: I train a system to recognize cats, but my training data contains only tabbies. What I want is a way of modelling that, while it may concentrate more probability on a tabby cat-like-thingy being a cat than a non-tabby cat-like-thingy, will still predict appropriately if I actually condition it on “but what if cats weren’t tabby by nature?”.
        
        I think you said you’re a follower of the probabilistic programming approach, and in terms of being able to condition those models on counterfactual parameterizations and make predictions, I think they’re very much on the right track.
  - Vaniver 5 May 2015 17:29 UTC
    4 points
    Parent
    Your first remark, which starts “If there is some good way...”
    
    I suggest quoting the remarks using the markdown syntax with a > in front of the line, like so:
    
    >If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans, then yes, this works.
    That will look like this:
    
    If there is some good way of explaining plans to programmers such that programmers will only approve of non-terrible plans, then yes, this works.
    
    You can then respond to the quotes afterwards, and the flow will be more obvious to the reader.
    - [deleted] 5 May 2015 17:40 UTC
      9 points
      Parent
      Thank you. I edited the remarks to conform. I was not familiar with the mechanism for quoting, here. Let me know if I missed any.
      - Vaniver 5 May 2015 17:47 UTC
        1 point
        Parent
        You’re welcome!
  - [deleted] 7 May 2015 14:40 UTC
    3 points
    Parent
    
    If the AI starts to get even a hint that there are objections, it has to kick in a serious review of the plan. It will ask everyone (it is an AI, after all: it can do that even if there are 100 billion people on the planet). If it gets feedback from anyone saying that they object to the plan, that is the end of the story: it does not force anyone to go through with it. That means, it is a fundamental feature of the checking code that it will veto a plan under that circumstance. Notice, by the way, that I have generalized “consulting the programmers” to “consulting everyone”. That is an obvious extension, since the original programmers were only proxies for the will of the entire species.
    
    This assumes that no human being would ever try to just veto everything to spite everyone else. A process for determining AGI volition that is even more overconstrained and impossible to get anything through than a homeowners’ association meeting sounds to me like a bad idea.