Donald Hobson comments on Why I expect successful (narrow) alignment

Donald Hobson 31 Dec 2018 21:41 UTC
1 point
You can have an AI that isn’t a consequentialist. Many deep learning algorithms are pure discriminators, they are not very dangerous or very useful. If I want to make a robot that tidies my room, the simplest conceptual framework for this is a consequentialist with real world goals. (I could also make a hackish patchwork of heuristics, like evolution would). If I want the robot to deal with circumstances that I haven’t considered, most hardcoded rules approaches fail, you need something that behaves like a consequentialist with real world preferences.
I’m not saying that all AI’s will be real world consequentialists, just that there are many tasks only real world consequentialists can do. So someone will build one.
Also, they set up the community after they realized the problem, and they could probably make more money elsewhere. So there doesn’t seem to be strong incentives to lie.
- John_Maxwell 31 Dec 2018 22:01 UTC
  1 point
  Parent
  
  most hardcoded rules approaches fail
  
  Nowadays people don’t use hardcoded rules, they use machine learning. Then the problem of AI safety boils down to the problem of doing really good machine learning: having models with high accuracy that generalize well. Once you’ve got a really good model for your preferences, and for what constitutes corrigible behavior, then you can hook it up to an agent if you want it to be able to do a wide range of tasks. (Note: I wouldn’t recommend a “consequentialist” agent, because consequentialism sounds like the system believes the ends justify the means, and that’s not something we want for our first AGI—see corrigibility.)
  
  Also, they set up the community after they realized the problem, and they could probably make more money elsewhere. So there doesn’t seem to be strong incentives to lie.
  
  I’m not accusing them of lying, I think they are communicating their beliefs accurately. “It’s difficult to get a man to understand something, when his salary depends on his not understanding it.” MIRI has a lot invested in the idea that AI safety is a hard problem which must have a difficult solution. So there’s a sense in which the salaries of their employees depend on them not understanding how a simple solution to FAI might work. This is really unfortunate because simple solutions tend to be the most reliable & robust.
  
  If we start with the assumption that a simple solution does exist, we’re much more likely to find one.
  
  Donald Knuth
  
  MIRI has started with the opposite assumption. Insofar as I’m pessimistic about them as an organization, this is the main reason why.
  
  Inadequate Equilibria talks about the problem of the chairman of Japan’s central bank, who doesn’t have a financial incentive to help Japan’s economy. Does it change the picture if the chairman of Japan’s central bank could make a lot more money in investment banking? Not really. He still isn’t facing a good set of incentives when he goes into work every day, meaning he is not going to do a good job. He probably cares more about local social incentives than his official goal of helping the Japanese economy. Same for MIRI employees.
  - PeterMcCluskey 2 Jan 2019 1:57 UTC
    10 points
    Parent
    
    MIRI has a lot invested in the idea that AI safety is a hard problem which must have a difficult solution. So there’s a sense in which the salaries of their employees depend on them not understanding how a simple solution to FAI might work.
    
    That doesn’t sound correct. My understanding is that they’re looking for simple solutions, in the sense that quantum mechanics and general relativity are simple. What they’ve invested a lot in is the idea that it’s hard to even ask the right questions about how AI alignment might work. They’re biased against easy solutions, but they might also be biased in favor of simple solutions.
    - John_Maxwell 2 Jan 2019 11:04 UTC
      2 points
      Parent
      We value quantum mechanics and relativity because there are specific phenomena which they explain well. If I’m a Newtonian physics advocate, you can point me to solid data my theory doesn’t predict in order to motivate the development of a more sophisticated theory. We were able to advance beyond Newtonian physics because we were able to collect data which disconfirmed the theory. Similarly, if someone suggests a simple approach to FAI, you should offer a precise failure mode, in the sense of a toy agent in a toy environment which clearly exhibits undesired behavior (writing actual code to make the conversation precise and resolve disagreements as necessary), before dismissing it. This is how science advances. If you add complexity to your theory without knowing what the deficiencies of the simple theory are, you probably won’t add the right sort of complexity because your complexity isn’t well motivated.
      
      Math and algorithms are both made up of thought stuff. So these physics metaphors can only go so far. If I start writing a program to solve some problem, and I choose a really bad set of abstractions, I may get halfway through writing my program and think to myself “geez this is a really hard problem”. The problem may be very difficult to think about given the set of abstractions I’ve chosen, but it could be much easier to think about given a different set of abstractions. It’d be bad for me to get invested in the idea that the problem is hard to think about, because that could cause me to get attached to my current set of inferior abstractions. If you want the simplest solution possible, you should exhort yourself to rethink your abstractions if things get complicated. You should always be using your peripheral vision to watch out for alternative sets of abstractions you could be using.