tailcalled comments on TurnTrout’s shortform feed

tailcalled 14 Aug 2024 7:31 UTC
7 points
−1
That is mostly an argument by intuition. To make it more rigorous and transparent, here is a constructive proof:

Let the curriculum be sampled uniformly at random. This has ~no mutual information with the world. Therefore the AI does not learn any methods in the world that can cause human extinction.
- Stephen Fowler 15 Aug 2024 3:20 UTC
  1 point
  0
  Parent
  Does this not mean the AI has also learnt no methods that provide any economic benefit either?
  - tailcalled 15 Aug 2024 5:40 UTC
    2 points
    0
    Parent
    Yes. TurnTrout’s intuitive argument did not contain any premises that implied the AI learnt any methods that provide any economic benefit, so I thought it wouldn’t be necessary to include that in the constructive proof either.
    - Stephen Fowler 15 Aug 2024 10:09 UTC
      1 point
      0
      Parent
      Right, but that isn’t a good safety case because such an AI hasn’t learnt about the world and isn’t capable of doing anything useful. I don’t see why anyone would dedicate resources to training such a machine.
      I didn’t understand TurnTrouts original argument to be limited to only “trivially safe” (ie. non-functional) AI systems.
      - tailcalled 15 Aug 2024 10:17 UTC
        2 points
        0
        Parent
        
        I didn’t understand TurnTrouts original argument to be limited to only “trivially safe” (ie. non-functional) AI systems.
        
        How did you understand the argument instead?
        Stephen Fowler 15 Aug 2024 11:29 UTC
        1 point
        0
        Parent
        I can see that the condition you’ve given, that a “curriculum be sampled uniformly at random” with no mutual information with the real world is sufficient for a curriculum to satisfy Premise 1 of TurnTrouts argument.
        But it isn’t immediately obvious to me that it is a sufficient and necessary condition (and therefore equivalent to Premise 1).
        tailcalled 15 Aug 2024 11:32 UTC
        2 points
        0
        Parent
        I’m not claiming to have shown something equivalent to premise 1, I’m claiming to have shown something equivalent to the conclusion of the proof (that it’s possible to make an AI which very probably does not cause x-risk), inspired by the general idea of the proof but simplifying/constructifying it to be more rigorous and transparent.
        Stephen Fowler 15 Aug 2024 13:51 UTC
        1 point
        0
        Parent
        I might be misunderstanding something crucial or am not expressing myself clearly.
        
        I understand TurnTrout’s original post to be an argument for a set of conditions which, if satisfied, prove the AI is (probably) safe. There are no restrictions on the capabilities of the system given in the argument.
        
        You do constructively show “that it’s possible to make an AI which very probably does not cause x-risk” using a system that cannot do anything coherent when deployed.
        
        But TurnTrout’s post is not merely arguing that it is “possible” to build a safe AI.
        
        Your conclusion is trivially true and there are simpler examples of “safe” systems if you don’t require them to do anything useful or coherent. For example, a fried, unpowered GPU is guaranteed to be “safe” but that isn’t telling me anything useful.