johnswentworth comments on Brains and backprop: a key timeline crux

johnswentworth 10 Mar 2018 6:41 UTC
16 points
Interesting fact about backprop: a supply chain of profit-maximizing, competitive companies can be viewed as implementing backprop. Obviously there’s some setup here, but it’s reasonably general; I’ll have a long post on it at some point. This should not be very surprising: backprop is just an efficient algorithm for calculating gradients, and prices in competitive markets are basically just gradients of production functions.
Anyway, my broader point is this: backprop is just an efficient way to calculate gradients. In a distributed system (e.g. a market), it’s not necessarily the most efficient gradient-calculation algorithm. What’s relevant is not whether the brain uses backpropagation per se, but whether it uses gradient descent. If the brain mainly operates off of gradient descent, then we have that theoretical tool already, regardless of the details of how the brain computes the gradient.
Many of the objections listed to brain-as-backprop only apply to single-threaded, vanilla backprop, rather than gradient descent more generally.
- jacobjacob 10 Mar 2018 11:57 UTC
  12 points
  Parent
  I’m looking forward to reading that post.
  Yes, it seems right that gradient descent is the key crux. But I’m not familiar with any efficient way of doing it that the brain might implement, apart from backprop. Do you have any examples?
  - johnswentworth 10 Mar 2018 17:46 UTC
    7 points
    Parent
    Here’s my preferred formulation of the general derivative problem (skip to the last paragraph if you just want the summary): you have some function $f (x)$ . We’ll assume that it’s been “flattened out”, i.e. all the loops and recursive calls have been expanded, it’s just a straight-line numerical function. Adopting hilariously bad variable names, suppose the $i$ -th line of $f$ computes $y_{i}$ . We’ll also assume that the first lines of $f$ just load in $x$ , so e.g. $y_{0} = x_{0}$ . If $f$ has $n$ lines, then the output of $f$ is $y_{n}$ .
    Now, we create a vector-valued function $F (y)$ , which runs each line of f in parallel: $F_{i} (y) = ($ line $i$ of $f$ evaluated at $y$ $)$ . $f (x)$ computes a fixed point $y = F (y)$ (it may take a moment of thought or an example for that part to make sense). It’s that fixed point formula which we differentiate. The result: we get $x = A \frac{d y}{d x}$ , where $A$ is a very sparse triangular matrix. In fact, we don’t even need to solve the whole thing—we only need $\frac{d y_{n}}{d x}$ . Backprop just uses the usual method for solving triangular matrices: start at the end and work back.
    Main point: derivative calculation, in general, can be done by solving a (sparse, triangular) system of linear equations. There’s a whole field devoted to solving sparse matrices, especially in parallel. Different methods work better depending on the matrix structure (which will follow the structure of the computation DAG of $f$ ), so different methods will work better for different functions. Pick your favorite sparse matrix solver, ideally one which will leverage triangularity, and boom, you have a derivative calculator.
    Side note: do these comments support LaTeX? Is there a page explaining what comments do support? It doesn’t seem to be markdown, no idea what we’re using here.
    - Ben Pace 10 Mar 2018 19:49 UTC
      2 points
      Parent
      Side note: do these comments support LaTeX? Is there a page explaining what comments do support? It doesn’t seem to be markdown, no idea what we’re using here.
      It is a WYSIWYG markdown editor and dollar-sign is the symbol that opens the LaTex editor (I’ve LaTexed your comment for you, hope that’s okay).
      Added: @habryka oops, double-comment!
      - johnswentworth 10 Mar 2018 22:23 UTC
        1 point
        Parent
        Ooooh, that makes much more sense now, I was confused by the auto-formatting as I typed. Thank you for taking the time to clean up my comment. Also thankyou @habryka.
        Also, how do images work in posts? I was writing up a post the other day, but when I tried to paste in an image it just created a camera symbol. Alternatively, is this stuff documented somewhere?
        Ben Pace 10 Mar 2018 22:49 UTC
        4 points
        Parent
        My transatlantic flight permitting, I’ll reply with a post tomorrow with full descriptions of how to use the editor.
        johnswentworth 10 Mar 2018 23:57 UTC
        1 point
        Parent
        Thank you very much! I really appreciate the time you guys are putting in to this.
        Ben Pace 11 Mar 2018 21:23 UTC
        5 points
        Parent
        You’re welcome :-) Here’s a mini-guide to the editor.
        jacobjacob 11 Mar 2018 0:16 UTC
        1 point
        Parent
        The thing is now in LaTeX! Beautiful!
    - habryka 10 Mar 2018 19:45 UTC
      2 points
      Parent
      Yep, we support LaTeX and do a WYSIWYG translation of markdown as soon as you type it (I.e. words between asterisks get bolded, etc.). You can start typing LaTeX by typing $ and then a small equation editor shows up. You can also insert block-level equations by pressing CTRL+M.
      - CronoDAS 12 Mar 2018 21:00 UTC
        2 points
        Parent
        Typing $ does nothing on my iPhone.
        habryka 12 Mar 2018 22:39 UTC
        4 points
        Parent
        Because the mobile editing experience was pretty buggy, we replaced the mobile editor with a markdown-only editor two days ago. We will activate LaTeX for that editor pretty soon (which will probably mean replacing equations between “$$” with the LaTeX rendered version), but that means LaTeX is temporarily unavailable on phones (though the previous LaTeX editor didn’t really work with phones anyways, so it’s mostly just a strict improvement on what we have).
        CronoDAS 13 Mar 2018 2:55 UTC
        2 points
        Parent
        Ok, no problem; I don’t really know LaTeX anyway.
  - Daniel Kokotajlo 14 Mar 2021 15:05 UTC
    4 points
    Parent
    Hello from the future! I’m interested to hear how your views have updated since this comment and post were written. 1. What is your credence that the brain learns via gradient descent? 2. What is your credence that it in fact does so in a way relevantly similar to backprop? 3. Do you still think that insofar as your credence in 1 is high, timelines are short?
    - jacobjacob 15 Mar 2021 9:56 UTC
      4 points
      Parent
      I appreciate you following up on this!
      
      The sad and honest truth, though, is that since I wrote this post, I haven’t thought about it. :( I haven’t picked up on any key new piece of evidence—though I also haven’t been looking.
      
      I could give you credences, but that would mostly just involve rereading this and loading up all the thoughts
      - Daniel Kokotajlo 15 Mar 2021 10:06 UTC
        2 points
        Parent
        Ok! Well, FWIW, it seems very likely to me that the brain learns via gradient descent, and indeed probable that it does something relevantly similar (though of course not identical to) backprop. (See the link above). But I feel very much an imposter discussing all this stuff since I lack technical expertise. I’d be interested to hear your take on this stuff sometime if you have one or want to make one! See also:
        https://arxiv.org/abs/2006.04182 (Brains = predictive processing = backprop = artificial neural nets)
        https://www.biorxiv.org/content/10.1101/764258v2.full (IIRC this provides support for Kaplan’s view that human ability to extrapolate is really just interpolation done by a bigger brain on more and better data.)
        What links here?
        jacobjacob's comment on Does davidad’s uploading moonshot work? by jacobjacob (4 Nov 2023 2:54 UTC; 2 points)
        jacobjacob 15 Mar 2021 12:21 UTC
        6 points
        Parent
        I’m currently on vacation, but I’d be interested in setting up a call once I’m back in 2 weeks! :) I’ll send you my calendly in PM