eeegnu comments on AGI Ruin: A List of Lethalities

eeegnu 13 Jun 2022 5:02 UTC
1 point
0
These are great points, and ones which I did actually think about when I was brainstorming this idea (if I understand them correctly.) I intend to write out a more thorough post on this tomorrow with clear examples (I originally imagined this as extracting deeper insights into chess), but to answer these:
1. I did think about these as translators for the actions of models into natural language, though I don’t get the point about extracting things beyond what’s in the original model.
2. I mostly glossed over this part in the brief summary, and the motivation I had for it comes from how (unexpectedly?) it works for GAN’s to just start with random noise, and in the process the generator and discriminator both still improve each other.
3. My thoughts here were for the explainer models update error vector to come from judging the learner model on new unseen tasks without the explanation (i.e. how similar are they to the original models outputs.) In this way the explainer gets little benefit from just giving the answer directly, since the learner will be tested without it, but if the explanation in any way helps the learner learn, it’ll improve its performance more (this is basically what the entire idea hinges on.)
- TekhneMakre 13 Jun 2022 5:22 UTC
  2 points
  0
  Parent
  (I didn’t understand this on one read, so I’ll wait for the post to see if I have further comments. I didn’t understand the analogy / extrapolation drawn in 2., and I didn’t understand what scheme is happening in 3.; maybe being a little more precise and explicit about the setup would help.)