Great article! Just reached out. A couple ideas I want to mention are working on safer models directly (example: https://www.lesswrong.com/posts/JviYwAk5AfBR7HhEn/how-to-control-an-llm-s-behavior-why-my-p-doom-went-down-1), which for smaller models might not be cost prohibitive to make progress on. There’s also building safety-related cognitive architecture components that have commercial uses. For example, world model work (example: https://www.lesswrong.com/posts/nqFS7h8BE6ucTtpoL/let-s-buy-out-cyc-for-use-in-agi-interpretability-systems) or memory systems (example: https://www.lesswrong.com/posts/FKE6cAzQxEK4QH9fC/qnr-prospects-are-important-for-ai-alignment-research). My work is trying to do a few of these things concurrently (https://www.lesswrong.com/posts/caeXurgTwKDpSG4Nh/safety-first-agents-architectures-are-a-promising-path-to).
Responded! And thanks for sharing, will check out those posts. Really enjoyed the first one which I read the other day.
Great article! Just reached out. A couple ideas I want to mention are working on safer models directly (example: https://www.lesswrong.com/posts/JviYwAk5AfBR7HhEn/how-to-control-an-llm-s-behavior-why-my-p-doom-went-down-1), which for smaller models might not be cost prohibitive to make progress on. There’s also building safety-related cognitive architecture components that have commercial uses. For example, world model work (example: https://www.lesswrong.com/posts/nqFS7h8BE6ucTtpoL/let-s-buy-out-cyc-for-use-in-agi-interpretability-systems) or memory systems (example: https://www.lesswrong.com/posts/FKE6cAzQxEK4QH9fC/qnr-prospects-are-important-for-ai-alignment-research). My work is trying to do a few of these things concurrently (https://www.lesswrong.com/posts/caeXurgTwKDpSG4Nh/safety-first-agents-architectures-are-a-promising-path-to).
Responded! And thanks for sharing, will check out those posts. Really enjoyed the first one which I read the other day.