Brendon_Wong comments on Some for-profit AI alignment org ideas

Brendon_Wong 14 Dec 2023 16:32 UTC
3 points
0
Great article! Just reached out. A couple ideas I want to mention are working on safer models directly (example: https://www.lesswrong.com/posts/JviYwAk5AfBR7HhEn/how-to-control-an-llm-s-behavior-why-my-p-doom-went-down-1), which for smaller models might not be cost prohibitive to make progress on. There’s also building safety-related cognitive architecture components that have commercial uses. For example, world model work (example: https://www.lesswrong.com/posts/nqFS7h8BE6ucTtpoL/let-s-buy-out-cyc-for-use-in-agi-interpretability-systems) or memory systems (example: https://www.lesswrong.com/posts/FKE6cAzQxEK4QH9fC/qnr-prospects-are-important-for-ai-alignment-research). My work is trying to do a few of these things concurrently (https://www.lesswrong.com/posts/caeXurgTwKDpSG4Nh/safety-first-agents-architectures-are-a-promising-path-to).
- Eric Ho 14 Dec 2023 17:35 UTC
  1 point
  0
  Parent
  Responded! And thanks for sharing, will check out those posts. Really enjoyed the first one which I read the other day.