Have you read Eric Drexler’s work on open agencies and applying open agencies to present-day LLMs? Open agencies seem like progress towards a safer design for current and future cognitive architectures. Drexler’s design touches on some of the aspects you mention in the post, like:
The system can be coded to both check itself against its goals, and invite human inspection if it judges that it is considering plans or actions that may either violate its ethical goals, change its goals, or remove it from human control.
I had read the first but hadn’t seen the second. I just read it. It’s well written and highly relevant, and I’ll be citing it in my future work. Thanks much for the reference!
Have you read Eric Drexler’s work on open agencies and applying open agencies to present-day LLMs? Open agencies seem like progress towards a safer design for current and future cognitive architectures. Drexler’s design touches on some of the aspects you mention in the post, like:
I had read the first but hadn’t seen the second. I just read it. It’s well written and highly relevant, and I’ll be citing it in my future work. Thanks much for the reference!