[Linkpost] Value extraction via language model abduction

Link post

This is my first post on lesswrong. I’ll merely be linkposting content on epistemics and alignment here while getting more familiar with the culture.

tl;dr:

We attempt to automatically infer one’s beliefs from their writing in three different ways. Initial results based on Twitter data hint at embeddings and language models being particularly promising approaches.