I’ve made some attempts over the past year to (at least somewhat) raise the profile of this kind of approach and its potential, especially when applied to automated safety research (often in conversation with / prompted by Ryan Greenblatt; and a lot of it stemming from my winter ’24 Astra Fellowship with @evhub):
I might write a high-level post at some point (which should hopefully help with the visibility of this kind of agenda more than the separate comments on various other [not-necessarily-that-related] posts).
(Others had already written on these/related topics previously, e.g. https://www.lesswrong.com/posts/r3xwHzMmMf25peeHE/the-translucent-thoughts-hypotheses-and-their-implications, https://www.lesswrong.com/posts/FRRb6Gqem8k69ocbi/externalized-reasoning-oversight-a-research-direction-for, https://www.lesswrong.com/posts/fRSj2W4Fjje8rQWm9/thoughts-on-sharing-information-about-language-model#Accelerating_LM_agents_seems_neutral__or_maybe_positive_, https://www.lesswrong.com/posts/dcoxvEhAfYcov2LA6/agentized-llms-will-change-the-alignment-landscape, https://www.lesswrong.com/posts/ogHr8SvGqg9pW5wsT/capabilities-and-alignment-of-llm-cognitive-architectures, https://intelligence.org/visible/.)
I’ve made some attempts over the past year to (at least somewhat) raise the profile of this kind of approach and its potential, especially when applied to automated safety research (often in conversation with / prompted by Ryan Greenblatt; and a lot of it stemming from my winter ’24 Astra Fellowship with @evhub):
https://www.lesswrong.com/posts/HmQGHGCnvmpCNDBjc/current-ais-provide-nearly-no-data-relevant-to-agi-alignment#mcA57W6YK6a2TGaE2
https://www.lesswrong.com/posts/yQSmcfN4kA7rATHGK/many-arguments-for-ai-x-risk-are-wrong?commentId=HiHSizJB7eDN9CRFw
https://www.lesswrong.com/posts/yQSmcfN4kA7rATHGK/many-arguments-for-ai-x-risk-are-wrong?commentId=KGPExCE8mvmZNQE8E
https://www.lesswrong.com/posts/yQSmcfN4kA7rATHGK/many-arguments-for-ai-x-risk-are-wrong?commentId=zm4zQYLBf9m8zNbnx
https://www.lesswrong.com/posts/yQSmcfN4kA7rATHGK/many-arguments-for-ai-x-risk-are-wrong?commentId=TDJFetyynQzDoykEL
I might write a high-level post at some point (which should hopefully help with the visibility of this kind of agenda more than the separate comments on various other [not-necessarily-that-related] posts).