Gerald Monroe comments on How to Control an LLM’s Behavior (why my P(DOOM) went down)

Gerald Monroe 28 Nov 2023 23:08 UTC
2 points
0
This approach is alignment by bootstrapping. To use it you need some agent able to tag all the text in the training set, with many different categories.

Pre GPT4, how could you do this?

You could also use combinations, develop a “clean” agent only able to emit the text you find desirable/smart and then re evaluate all the text on the internet. Double distillation essentially.

You could also have gpt4 research using it’s web browsing capabilities/scientific journal access any text it analyzes and categorize by factual accuracy.

Note also that tags can be relative : you multiply your weight updates and loss penalty so the model has smaller weight changes/penalty for not regurgitating correctly “bad” text.

It’s like all the other human technology, we couldn’t get to clean forms of industry without using a crude and dirty and dangerous initial method.
- RogerDearnaley 28 Nov 2023 23:32 UTC
  3 points
  0
  Parent
  Note also that tags can be relative : you multiply your weight updates and loss penalty so the model has smaller weight changes/penalty for not regurgitating correctly “bad” text.
  If you read the paper, they tried several methods like that, none of which ended up working as well as the really simple conditional training approach where you just train it to label bad text as bad. It is of course possible that someone will come up with another approach along these lines that works better, but this seems to be hard.
- Brendon_Wong 2 Dec 2023 6:18 UTC
  1 point
  0
  Parent
  This approach is alignment by bootstrapping. To use it you need some agent able to tag all the text in the training set, with many different categories.
  Pre GPT4, how could you do this?
  Well, humans created all of the training data on our own, so it should be possible to add the necessary structured data to that! There are large scale crowdsourced efforts like Wikipedia. Extending Wikipedia, and a section of the internet, with enhancements like associating structured data with unstructured data, plus a reputation-weighted voting system to judge contributions, seems achievable. You could even use models to prelabel the data but have that be human verified at a large scale (or in semi-automated or fully automated, but non-AI ways). This is what I’m trying to do with Web 10. Geo is the Web3 version of this, and the only other major similar initiative I’m aware of.