Jonathan Moregård comments on How evolutionary lineages of LLMs can plan their own future and act on these plans

Jonathan Moregård 27 Dec 2022 18:30 UTC
2 points
0
Another question (that might be related to excluding LW/AF):

This paragraph:

Consequently, the LLM cannot help but also form beliefs about the future of both “selves”, primarily the “evolutionary” one, at least because this future is already discussed in the training data of the model (e. g., all instances of texts that say something along the lines of “LLMs will transform the economy by 2030”)

Seems to imply that the LW narrative of sudden turns etc might not be a great thing to put in the training corpus.

Is there a risk of “self-fulfilling prophecies” here?