DaemonicSigil comments on My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”

DaemonicSigil 21 Mar 2023 9:41 UTC
LW: 2 AF: 1
1
AF
Difficulty of Alignment

I find the prospect of training on model on just 40 parameters to be very interesting. Almost unbelievable, really, to the point where I’m tempted to say: “I notice that I’m confused”. Unfortunately, I don’t have access to the paper and it doesn’t seem to be on sci-hub, so I haven’t been able to resolve my confusion. Basically, my general intuition is that each parameter in a network probably only contributes a few bits of optimization power. It can be set fairly high, fairly low, or in between. So if you just pulled 40 random weights from the network, that’s maybe 120 bits of optimization power. Which might be enough for MNIST, but probably not for anything more complicated. So I’m guessing that most likely a bunch of other optimization went into choosing exactly which 40 dimensional subspace we should be using. Of course, if we’re allowed to do that then we could even do it with a 1 dimensional subspace: Just pick the training trajectory as your subspace!

Generally with the mindspace thing, I don’t really think about the absolute size or dimension of mindspace, but the relative size of “things we could build” and “things we could build that would have human values”. This relative size is measured in bits. So the intuition here would be that it takes a lot of bits to specify human values, and so the difference in size between these two is really big. Now maybe if you’re given Common Crawl, it takes fewer bits to point to human values within that big pile of information. But it’s probably still a lot of bits, and then the question is how do you actually construct such a pointer?

Demons in Gradient Descent

I agree that demons are unlikely to be a problem, at least for basic gradient descent. They should have shown up by now in real training runs, otherwise. I do still think gradient descent is a very unpredictable process (or to put it more precisely: we still don’t know how to predict gradient descent very well), and where that shows up is in generalization. We have a very poor sense of which things will generalize and which things will not generalize, IMO.
- DanielFilan 22 Mar 2023 0:15 UTC
  LW: 3 AF: 3
  0
  AF Parent
  For the 40 parameters thing, this link should work. See also this earlier paper.
  - DanielFilan 22 Mar 2023 0:17 UTC
    LW: 4 AF: 4
    1
    AF Parent
    BTW: the way I found that first link was by searching the title on google scholar, finding the paper, and clicking “All 5 versions” below (it’s right next to “Cited by 7″ and “Related articles”). That brought me to a bunch of versions, one of which was a seemingly-ungated PDF. This will probably frequently work, because AI researchers usually make their papers publicly available (at least in pre-print form).
  - DaemonicSigil 22 Mar 2023 5:27 UTC
    LW: 1 AF: 1
    0
    AF Parent
    Thanks for the link! Looks like they do put optimization effort into choosing the subspace, but it’s still interesting that the training process can be factored into 2 pieces like that.

DaemonicSigil comments on My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”

Difficulty of Alignment

Demons in Gradient Descent