Rafael Harth comments on Open & Welcome Thread—July 2020

Rafael Harth 12 Jul 2020 12:50 UTC
5 points
In the latest AI alignment podcast, Evan said the following (this is quoted from the transcript):
But there’s multiple possible channels through which information about the loss function can enter the model. And so I’ll fundamentally distinguish between two different channels, which is the information about the loss function can enter through the gradient descent process, or it can enter through the model’s input data.
I’ve been trying to understand the distinction between those two channels. After reading a bunch about language models and neural networks, my best guess is that large neural networks have a structure such that their internal state changes while they process input data, even outside of a learning process. So, if a very sophisticated neural network like that of GPT-3 reads a bunch about Lord of the Rings, this will lead it to represent facts about the franchise internally, without gradient descent doing anything. That would be the “input data” channel.
Can someone tell me whether I got this right?
- Steven Byrnes 22 Jul 2020 23:21 UTC
  10 points
  Parent
  Yeah. There’s no gradient descent within a single episode, but if you have a network with input (as always) and with memory (e.g. an RNN) then its behavior in any given episode can be a complicated function of its input over time in that episode, which you can describe as “it figured something out from the input and that’s now determining its further behavior”. Anyway, everything you said is right, I think.