RobertKirk comments on Epistemological Vigilance for Alignment

RobertKirk 6 Jun 2022 13:57 UTC
LW: 3 AF: 3
AF

Newtonian: complex reactions

So please suggest alternative names and characterizations, or ask questions to pinpoint what I’m describing.

Are you pointing here at the fact that the AI training process and world will be a complex system, and as such it is hard to predict the outcomes of interventions, and hence the first-order obvious outcomes of interventions may not occur, or may be dominated by higher-order outcomes? That’s what the “complex reactions” and some of the references kind of point at, but then in the description you seem to be talking more about a specific case: Strong optimisation will always find a path if it exists, so patching some but not all paths isn’t useful, and in fact could have weird counter-productive effects if the remaining paths that the strong optimisation takes are actually worse in some other ways than the ones you patched.

Other possible names would then be either leaning into the complex systems view, so the (possibly incorrect) assumption is something like “non-complexity” or “linear/predictable responses”; or leaning into the optimisation paths analogy which might be something like “incremental improvement is ok” although that is pretty bad as a name.
- adamShimi 6 Jun 2022 17:44 UTC
  LW: 4 AF: 3
  AF Parent
  Are you pointing here at the fact that the AI training process and world will be a complex system, and as such it is hard to predict the outcomes of interventions, and hence the first-order obvious outcomes of interventions may not occur, or may be dominated by higher-order outcomes?
  This points at the same thing IMO, although still in a confusing way. This assumption is basically that you can predict the result of an intervention without having to understand the internal mechanism in detail, because the latter is straightforward.
  Other possible names would then be either leaning into the complex systems view, so the (possibly incorrect) assumption is something like “non-complexity” or “linear/predictable responses”; or leaning into the optimisation paths analogy which might be something like “incremental improvement is ok” although that is pretty bad as a name.
  Someone at Conjecture proposed linear too, but Newtonian physics isn’t linear. Although I agree that the sort of behavior and reaction I’m pointing out fit within the “non-linear” category.
  - Kenoubi 8 Jun 2022 20:12 UTC
    2 points
    Parent
    Thermodynamic? Thermodynamics seems to be about using a small number of summary statistics (temperature, pressure, density, etc.) because the microstructure of the system isn’t necessary to compute what will happen at the macro level.
  - RobertKirk 20 Jun 2022 9:40 UTC
    LW: 1 AF: 1
    AF Parent
    
    This assumption is basically that you can predict the result of an intervention without having to understand the internal mechanism in detail, because the latter is straightforward.
    
    This seems to me that you want a word for whatever the opposite of complex/chaotic systems are, right? Although obviously “Simple” is probably not the best word (as it’s very generic). It could be “Simple Dynamics” or “Predictable Dynamics”?