watermark comments on LLMs for Alignment Research: a safety priority?

watermark 30 Apr 2024 21:34 UTC
4 points
0
I think it should be a safety priority.

Currently, I’m attempting to make a modularized snapshot of end-to-end research related to alignment (covering code, math, a number of related subjects, diagrams, and answering Q/As) to create custom data, intended to be useful to future me (and other alignment researchers). If more alignment researchers did this, it’d be nice. And if they iterated on how to do it better.
For example, it’d be useful if your ‘custom data version of you’ broke the fourth wall often and was very willing to assist and over-explain things.

I’m considering going on Lecture-Walks with friends and my voice recorder to world-model dump/explain content so I can capture the authentic [curious questions < - > lucid responses] process

Another thing: It’s not that costly to do so—writing about what you’re researching is already normal, and making an additional effort to be more explicit/lucid/capture your research tastes (and its evolution) seems helpful