The humans ask the AI to produce design tools, rather than designs (...) we can piecewise verify that the tool is making accurate predictions (...) The way in which this attacks a central difficulty is by making it harder for the AI to just build unhelpful nanotech
I think this is a good way to put things, and it’s a concept that can be made more general and built upon.
Like, we can also have AIs produce:
Tools that make other tools
Tools that help to verify other tools
Tools that look for problems with other tools (in ways that don’t guarantee finding all problems, but can help find many)
Tools that help approximate brain emulations (or get us part of the way there), or predict what a human would say when responding to questions in some restricted domain
Etc, etc
Maybe you already have thought through such strategies very extensively, but AFAIK you don’t make that clear in any of your writings, and it’s not a trivial amount of inferential distance that is required to realize the full power of techniques like these.
I have written more about this concept in this post in this series. I’m not sure whether or not any of the concepts/ideas in the series are new, but it seems to me that several of them at the very least are under-discussed.
I think this is a good way to put things, and it’s a concept that can be made more general and built upon.
Like, we can also have AIs produce:
Tools that make other tools
Tools that help to verify other tools
Tools that look for problems with other tools (in ways that don’t guarantee finding all problems, but can help find many)
Tools that help approximate brain emulations (or get us part of the way there), or predict what a human would say when responding to questions in some restricted domain
Etc, etc
Maybe you already have thought through such strategies very extensively, but AFAIK you don’t make that clear in any of your writings, and it’s not a trivial amount of inferential distance that is required to realize the full power of techniques like these.
I have written more about this concept in this post in this series. I’m not sure whether or not any of the concepts/ideas in the series are new, but it seems to me that several of them at the very least are under-discussed.