Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo Nov 25, 2020, 11:45 AM
LW: 6 AF: 4
AF
I suspect that AIXI is misleading to think about in large part because it lacks reusable parameters—instead it just memorises all inputs it’s seen so far. Which means the setup doesn’t have episodes, or a training/deployment distinction; nor is any behaviour actually “reinforced”.
- DanielFilan Nov 25, 2020, 11:04 PM
  LW: 4 AF: 2
  AF Parent
  I kind of think the lack of episodes makes it more realistic for many problems, but admittedly not for simulated games. Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism. [EDIT: I retract the second sentence]
  - DanielFilan Nov 26, 2020, 5:23 PM
    LW: 2 AF: 1
    AF Parent
    
    Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism.
    
    Actually I think this is total nonsense produced by me forgetting the difference between AIXI and Solomonoff induction.
    - Richard_Ngo Nov 26, 2020, 5:31 PM
      LW: 2 AF: 1
      AF Parent
      Wait, really? I thought it made sense (although I’d contend that most people don’t think about AIXI in terms of those TMs reinforcing hypotheses, which is the point I’m making). What’s incorrect about it?
      - DanielFilan Nov 26, 2020, 5:42 PM
        LW: 2 AF: 1
        AF Parent
        Well now I’m less sure that it’s incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI’s actions, but that’s not right: there’s an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.
        Richard_Ngo Nov 26, 2020, 6:00 PM
        LW: 2 AF: 1
        AF Parent
        Oh, actually, you’re right (that you were wrong). I think I made the same mistake in my previous comment. Good catch.
  - [ ]
    
    [deleted]
- Steven Byrnes Nov 25, 2020, 10:34 PM
  LW: 4 AF: 2
  AF Parent
  Humans don’t have a training / deployment distinction either… Do humans have “reusable parameters”? Not quite sure what you mean by that.
  - Richard_Ngo Nov 26, 2020, 2:41 AM
    LW: 6 AF: 3
    AF Parent
    Yes we do: training is our evolutionary history, deployment is an individual lifetime. And our genomes are our reusable parameters.
    Unfortunately I haven’t yet written any papers/posts really laying out this analogy, but it’s pretty central to the way I think about AI, and I’m working on a bunch of related stuff as part of my PhD, so hopefully I’ll have a more complete explanation soon.
    - Steven Byrnes Nov 26, 2020, 3:17 AM
      LW: 2 AF: 1
      AF Parent
      Oh, OK, I see what you mean. Possibly related: my comment here.