Tom Shlomi comments on Tom Shlomi’s Shortform

Tom Shlomi 21 Feb 2023 6:15 UTC
1 point
1
The Constitutional AI paper, in a sense, shows that a smart alien with access to an RLHFed helpful language model can figure out how to write text according to a set of human-defined rules. It scares me a bit that this works well, and I worry that this sort of self-improvement is going to be a major source of capabilities progress going forward.