papetoast comments on papetoast’s Shortforms

papetoast 3 Dec 2024 13:06 UTC
23 points
3
In my life I have never seen a good one-paragraph explanation of backpropagation so I wrote one.
The most natural algorithms for calculating derivatives are done by going through the expression syntax tree^[1]. There are two ends in the tree; starting the algorithm from the two ends corresponds to two good derivative algorithms, which are called forward propagation (starting from input variables) and backward propagation respectively. In both algorithms, calculating the derivative of one output variable $y_{1}$ with respect to one input variable $x_{1}$ actually creates a lot of intermediate artifacts. In the case of forward propagation, these artifacts means you get $\frac{δ y_{n}}{δ x_{1}}$ for ~free, and in backward propagation you get $\frac{δ y_{1}}{δ x_{n}}$ for ~free. Backpropagation is used in machine learning because usually there is only one output variable (the loss, a number representing difference between model prediction and reality) but a lot of input variables (parameters; in the scale of millions to billions).
This blogpost has the clearest explanation. Credits for the image too.
1. ^
  or maybe a directed acyclic graph for multivariable vector-valued functions like f(x,y)=(2x+y, y-x)
- habryka 3 Dec 2024 18:25 UTC
  2 points
  −5
  Parent
  In the case of forward propagation, these artifacts means you get $\frac{δ y_{i}}{δ x_{1}}$ for ~free, and in backwards propagation you get $\frac{δ y_{1}}{δ x_{i}}$ for ~free.
  ~~Presumably you meant to say something else here than to repeat~~ $\frac{δ y_{i}}{δ x_{1}}$ ~~twice?~~
  Edit: Oops, I now see. There is a switched $i$ . I did really look quite carefully to spot any difference, but I apparently still wasn’t good enough. This all makes sense now.
  - papetoast 3 Dec 2024 22:35 UTC
    8 points
    0
    Parent
    It is hard to see, changed to n.
  - MondSemmel 3 Dec 2024 19:25 UTC
    4 points
    2
    Parent
    I could barely see that despite always using a zoom level of 150%. So I’m sometimes baffled at the default zoom levels of sites like LessWrong, wondering if everyone just has way better eyes than me. I can barely read anything at 100% zoom, and certainly not that tiny difference in the formulas!
    - habryka 3 Dec 2024 19:34 UTC
      2 points
      0
      Parent
      Our post font is pretty big, but for many reasons it IMO makes sense for the comment font to be smaller. So that plus LaTeX is a bit of a dicey combination.