David Scott Krueger (formerly: capybaralet) comments on [AN #141]: The case for practicing alignment work on GPT-3 and other large models

David Scott Krueger (formerly: capybaralet) 12 Mar 2021 2:28 UTC
LW: 3 AF: 3
AF
Intersting… Maybe this comes down to different taste or something. I understand, but don’t agree with, the cow analogy… I’m not sure why, but one difference is that I think we know more about cows than DNNs or something.

I haven’t thought about the Zipf-distributed thing.

> Taken literally, this is easy to do. Neural nets often get the right answer on never-before-seen data points, whereas Hutter’s model doesn’t. Presumably you mean something else but idk what.

I’d like to see Hutter’s model “translated” a bit to DNNs, e.g. by assuming they get anything right that’s within epsilon of a training data poing or something… maybe it even ends up looking like the other model in that context…
- Rohin Shah 12 Mar 2021 3:36 UTC
  LW: 2 AF: 2
  AF Parent
  I’d like to see Hutter’s model “translated” a bit to DNNs, e.g. by assuming they get anything right that’s within epsilon of a training data poing or something
  With this assumption, asymptotically (i.e. with enough data) this becomes a nearest neighbor classifier. For the $d$ -dimensional manifold assumption in the other model, you can apply the arguments from the other model to say that you scale as $D^{- c / d}$ for some constant $c$ (probably c = 1 or 2, depending on what exactly we’re quantifying the scaling of).
  I’m not entirely sure how you’d generalize the Zipf assumption to the “within epsilon” case, since in the original model there was no assumption on the smoothness of the function being predicted (i.e. [0, 0, 0] and [0, 0, 0.000001] could have completely different values.)