Shmi comments on Confusions in My Model of AI Risk

Shmi 7 Jul 2022 2:35 UTC
4 points
I agree that the term optimization is very slippery. My two go-to examples I used here before are: Is bacterium an optimizer? It bobs in all directions inching toward higher sugar concentration, growing and dividing as it does so. If so, is a boiling water bubble an optimizer? It bobs in all directions inching toward higher altitude, growing and dividing as it does so. If not what is the internal difference?
- Oliver Sourbut 7 Jul 2022 14:24 UTC
  2 points
  Parent
  In this response I eschew the word ‘optimization’^[1] but ‘control procedure’ might be synonymous with one rendering of ‘optimization’.
  
  Some bacteria perform^[2] a basic deliberation, ‘trying out’ alternative directions and periodically evaluating a heuristic (e.g. estimated sugar density) to seek out preferred locations. Iterated, this produces a simple control procedure which locates food items and avoids harmful substances. It can do this in a wide range of contexts, but clearly not all (as Peter alluded to via No Free Lunch). Put growing and dividing aside for now (they are separate algorithms).
  
  A boiling water bubble doesn’t do any deliberation—it’s a ‘reaction’ in my terminology. But, within the context of ‘is underwater in X temperature and Y pressure range and Z gravitational field distribution’, its movement and essential nature are preserved, so it’s ‘iterated’, and hence the relatively direct path to the surface can be thought of as a consequence of a (very very basic) control procedure. Outside of this context it’s disabled or destroyed.
  
  I take these basic examples as belonging to a spectrum of control procedures. Much more sophisticated ones may be able to proceed more efficiently to their goals, or do so from a wider range of starting conditions.
  
  EDIT to be clear, I think the internal difference between the bubble and the bacterium is that the bacterium evaluates e.g. sugar concentrations to form a (very minimal) estimated model of the ‘world’ around it, and these evaluations affect its ongoing behaviour. The bubble doesn’t do this.
  ↩︎
  For the same reasons I have been trying to eschew ‘agent’
  
  ↩︎
  HT John Wentworth for this video link
  - Shmi 7 Jul 2022 19:45 UTC
    2 points
    Parent
    Right, so the difference between an optimizer-like control procedure and your basic reaction-based control procedure is the existence of an identifiable “world model” that is used for “deliberation” and the deliberation engine is separate from the world model, but uses it to “make decisions”? Or am I missing something?
    - Oliver Sourbut 7 Jul 2022 23:08 UTC
      1 point
      Parent
      Yes, pretty much that’s a distinction I’d draw as meaningful, except I’d call the first one a ‘deliberative (optive) control procedure’, not an ‘optimizer’, because I think ‘optimizer’ has too many vague connotations.
      
      The ‘world model’ doesn’t have to be separate from the deliberation, or even manifested at all: consider iterated natural selection, which deliberates over mutations, without having a separate ‘model’ of anything—because the evaluation is the promotion and the action (unless you count the world itself and the replication counts of various traits as the model). But in the bacterial case, there really is some (basic) world model in the form of internal chemical states.
      - Oliver Sourbut 7 Jul 2022 23:11 UTC
        1 point
        Parent
        P.s. plants also do the basic thing I’d call deliberative control (or iterated deliberation). In the cases I described in that link, the model state is represented in analogue by the physical growth of the plant.
        
        (And yes, in all cases these are inner misaligned in some weak fashion.)
- FeepingCreature 7 Jul 2022 10:13 UTC
  2 points
  Parent
  I think a bacterium is not an optimizer. Rather, it is optimized by evolution. Animals start being optimizers by virtue of planning over internal representations of external states, which makes them mesaoptimizers of evolution.
  
  If we follow this model, we may consider that optimization requires a map-territory distinction. in that view, DNA is the map of evolution, and the CNS is the map of the animal. If the analogy holds, I’d speculate that the weights are the map of reinforcement learning, and the context window is the map of the mesaoptimizer.
  - Shmi 7 Jul 2022 19:32 UTC
    2 points
    Parent
    I think a bacterium is not an optimizer. Rather, it is optimized by evolution. Animals start being optimizers by virtue of planning over internal representations of external states, which makes them mesaoptimizers of evolution.
    Hmm, so where does the “true” optimization start? Or, at least what is the range of living creatures which are not-quite-complex to count as optimizers? Clearly a fish would be one, right? What about a sea cucumber? A plant?
    - FeepingCreature 15 Jul 2022 12:58 UTC
      4 points
      Parent
      Hm, difficult. I think the minimal required trait is the ability to learn patterns that map outputs to deferred reward inputs. So an organism that simply reacts to inputs directly would not be an optimizer, even if it has a (static) nervous system. A test may be if the organism can be made to persistedly change strategy by a change in reward, even in the immediate absence of the reward signal.
      
      I think maybe you could say that ants are not anthill optimizers? Because the optimization mechanism doesn’t operate at all on the scale of individual ants? Not sure if that holds up.