Logan Riggs comments on Solve Corrigibility Week

Logan Riggs 28 Nov 2021 17:02 UTC
LW: 3 AF: 2
AF
- Timelines and forecasting
- Goodhart’s law
- Power-seeking
- Human values
- Learning from human feedback
- Pivotal actions
- Bootstrapping alignment
- Embedded agency
- Primer on language models, reinforcement learning, or machine learning basics
  - This ones not really on-topic, but I do see value in a more “getting up to date” focus where experts can give talks or references to learn things (eg “here’s a tutorial for implementing a small GPT-2”). Though I could just periodically ask LW questions on whatever topic ends up interesting me at the moment. Though, I could do my own Google search, but I feel there’s some community value here that won’t be gained. Like learning and teaching together makes it easier for the community to coordinate in the future. Plus connections bonuses.