rokosbasilisk comments on Alignment By Default

rokosbasilisk 30 Jan 2023 6:19 UTC
1 point
This requires hitting a window—our data needs to be good enough that the system can tell it should use human values as a proxy, but bad enough that the system can’t figure out the specifics of the data-collection process enough to model it directly. This window may not even exist.
are there any real world examples of this? not necessarily in human-values setting