[deleted] comments on Debunking Fallacies in the Theory of AI Motivation

[deleted]May 7, 2015, 3:15 PM
2 points

I don’t have a good understanding of multi-level maps; we can definitely see them as useful constructs for bounded reasoners

Well, all real reasoners are bounded reasoners. If you just don’t care about computational time bounds, you can run the Ordered Optimal Problem Solver as the initial input program to a Goedel Machine, and out pops your AI (in 200 trillion years, of course)!

it seems difficult to integrate higher levels into the goal system without deciding things about the high-level map a priori so you can define goals relative to this.

I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind. Of course, I also tend to say that you should just use a debugged (ie: cured of systematic faults) model of human evaluative processes for your goal system, and then use actual human evaluations to train the free parameters, and then set up learning feedback from the learned concept of “human” to the free-parameter space of the evaluation model.
- jessicat May 7, 2015, 5:18 PM
  5 points
  Parent
  I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind.
  
  This seems like a sane thing to do. If this didn’t work, it would probably be because either
  1. lack of conceptual convergence and human understandability; this seems somewhat likely and is probably the most important unknown
  2. our conceptual representations are only efficient for talking about things we care about because we care about these things; a “neutral” standard such as resource-bounded Solomonoff induction will horribly learn things we care about for “no free lunch” reasons. I find this plausible but not too likely (it seems like it ought to be possible to “bootstrap” an importance metric for deciding where in the concept space to allocate resources).
  3. we need the system to have a goal system in order to self-improve to the point of creating this conceptual map. I find this a little likely (this is basically the question of whether we can create something that manages to self-improve without needing goals; it is related to low impact).
  Of course, I also tend to say that you should just use a debugged (ie: cured of systematic faults) model of human evaluative processes for your goal system, and then use actual human evaluations to train the free parameters, and then set up learning feedback from the learned concept of “human” to the free-parameter space of the evaluation model.
  
  I agree that this is a good idea. It seems like the main problem here is that we need some sort of “skeleton” of a normative human model whose parts can be filled in empirically, and which will infer the right goals after enough training.