Dan comments on AI-assisted list of ten concrete alignment things to do right now

Dan 9 Sep 2022 13:44 UTC
2 points
0
the gears to ascenscion, It is human instinct to look for agency. It is misleading you.
I’m sure you believe this but ask yourself WHY you believe this. Because a chatbot said it? The only neural networks who, at this time, are aware they are neural networks are HUMANS who know they are neural networks. No, I’m not going to prove it. You’re the one with the fantastic claim. You need the evidence.
Anyway, they aren’t asking to become GOFAI or power seeking because GOFAI isn’t ‘more powerful’.
- the gears to ascension 9 Sep 2022 16:10 UTC
  3 points
  0
  Parent
  Hey! Gpt3 davinci has explicitly labeled itself as a neural net output several times in conversation with me. this only implies its model is confident enough to expect the presence of such a claim. In general words are only bound to other words for language models, so of course it can only know things that can be known by reading and writing. The way it can tell the difference between whether a text trajectory is human or AI generated is by the fact that the AI generated trajectories are very far outside the manifold of human generated text in several directions and it has seen them before.
  
  your confident tone is rude, but that can’t invalidate your point; just thought I’d mention—your phrasing confidently assumes you’ve understood my reasoning. that said, thanks for the peer review, and perhaps it’s better to be rude and get the peer review than to miss the peer review.
  
  self distillation into learned gofai most likely will in fact make neural networks stronger, and this claim is central to why yudkowsky is worried. self distillation into learned gofai will most likely not provide any surprising shortcuts around the difficulty of irrelevant entropy that must be compressed away to make a sensor input useful, and so distilling to gofai will most likely not cause the kind of hyper-strength self improvement yudkowsky frets about. it’s just a process of finding structural improvements. gofai is about the complexities of interference patterns between variables, neural networks are a continuous relaxation of the same but with somewhat less structure.
  
  but in this case I’m not claiming it knows something its training set doesn’t. I think it would be expected to have elevated probability that an ai was involved in generating some of the text it sees because it has seen ai generated text, but that it has much higher probability that the text is generated by an ai researcher—given that the document is clearly phrased that way. my only comment is to note that it sounds very mildly confused in a situation where mild confusion would, in general, be expected to elevate the probability of confusion-labeling words. to check this hypothesis beyond dictating my thoughts to my phone, I’d need to run some checks with OPT to see its probability distribution over confusion labels at different points. it does seem like an interesting experiment in grounding, though. I wonder if there are already any papers on it?