jacob_cannell comments on Concept Safety: What are concepts for, and how to deal with alien concepts

jacob_cannell 20 Apr 2015 17:03 UTC
0 points

The main scenario I had implicitly in mind had something resembling a “childhood” for the AI, where its power and intelligence would be gradually increased while it interacted with human programmers in a training environment and was given feedback on what was considered “good” or “bad”, so that it would gradually develop concepts that approximated human morality as it tried to maximize the positive feedback.

This really is the most realistic scenario for AGI in general, given the generality of the RL architecture. Of course, there are many variations—especially in how the training environment and utility feedback interact.

Possibly even giving it a humanoid body at first, to further give a human-like grounding to its concepts

If we want the AI to do human-ish labor tasks, a humanoid body makes lots of sense. It also makes sense for virtual acting, interacting with humans in general, etc. A virtual humanoid body has many advantages—with instantiation in a physical robot as a special case.

(Another potential problem with it is that the AI’s values would become quite strongly shaped by those of its programmers, which not everyone would be likely to agree with.)

Yep—kindof unavoidable unless somebody releases the first advanced AGI for free. Even then, most people wouldn’t invest the time to educate and instill their values.

Another scenario I thought of would be to train the AI by something like the word embedding models, i.e. being given a vast set of moral judgments and then needing to come up with concepts simulating human moral reasoning in order to correctly predict the “right” judgments.

So say you train the AI to compute a mapping between a sentence in english describing a moral scenario and a corresponding sentiment/utility, how do you translate that into the AI’s reward/utility function? You’d need to somehow also map encodings of imagined moral scenarios back and forth between encodings of observation histories.
- Kaj_Sotala 23 Apr 2015 13:08 UTC
  3 points
  Parent
  
  This really is the most realistic scenario for AGI in general, given the generality of the RL architecture.
  
  Of course, “gradually training the AGI’s values through an extended childhood” gets tricky if it turns out that there’s a hard takeoff.
  
  So say you train the AI to compute a mapping between a sentence in english describing a moral scenario and a corresponding sentiment/utility, how do you translate that into the AI’s reward/utility function? You’d need to somehow also map encodings of imagined moral scenarios back and forth between encodings of observation histories.
  
  I was thinking that the task of training the AI to classify human judgments would then lead to it building up a model of human values, similar to the way that training a system to do word prediction builds up a language / world model. You make a good point of the need to then ground those values further; I haven’t really thought about that part.
  - jacob_cannell 23 Apr 2015 18:50 UTC
    0 points
    Parent
    
    Of course, “gradually training the AGI’s values through an extended childhood” gets tricky if it turns out that there’s a hard takeoff.
    
    Yes. Once you get the AGI up to roughly human child level, presumably autodidactic learning could takeover. Reading and understanding text on the internet is a specific key activity that could likely be sped up by a large factor.
    
    So—then we need ways to speed up human interaction and supervision to match.