Primer on language models, reinforcement learning, or machine learning basics
This ones not really on-topic, but I do see value in a more “getting up to date” focus where experts can give talks or references to learn things (eg “here’s a tutorial for implementing a small GPT-2”). Though I could just periodically ask LW questions on whatever topic ends up interesting me at the moment. Though, I could do my own Google search, but I feel there’s some community value here that won’t be gained. Like learning and teaching together makes it easier for the community to coordinate in the future. Plus connections bonuses.
Potential topics: what other topics besides corrigibility could we collaborate on in future weeks? Also, are we able to poll users for topics in site?
Timelines and forecasting
Goodhart’s law
Power-seeking
Human values
Learning from human feedback
Pivotal actions
Bootstrapping alignment
Embedded agency
Primer on language models, reinforcement learning, or machine learning basics
This ones not really on-topic, but I do see value in a more “getting up to date” focus where experts can give talks or references to learn things (eg “here’s a tutorial for implementing a small GPT-2”). Though I could just periodically ask LW questions on whatever topic ends up interesting me at the moment. Though, I could do my own Google search, but I feel there’s some community value here that won’t be gained. Like learning and teaching together makes it easier for the community to coordinate in the future. Plus connections bonuses.