Even though language models are impressive, and it definitely is something to be aware of that you could try to do amplification with language models and something like chain of thought prompting or AutoGPT’s task breakdown prompts, I still think that the typical IDA architecture is too prone to essentially training the model to hack itself. Heck, I’m worried that if you arranged humans in an IDA architecture, the humans would effectively “hack themselves.”
But given the suitability of language models for things even sorta like IDA, I agree you’re right to bring this up, and maybe there’s something clever nearby that we should be searching for.
Maybe.[1]
Even though language models are impressive, and it definitely is something to be aware of that you could try to do amplification with language models and something like chain of thought prompting or AutoGPT’s task breakdown prompts, I still think that the typical IDA architecture is too prone to essentially training the model to hack itself. Heck, I’m worried that if you arranged humans in an IDA architecture, the humans would effectively “hack themselves.”
But given the suitability of language models for things even sorta like IDA, I agree you’re right to bring this up, and maybe there’s something clever nearby that we should be searching for.