Thanks for the response! I think you make a good summary of the issues I have with this report. You evaluate “does our agent definitely do the thing” whereas I think the important question is “can any agent ever do the thing” (within a reasonably number of tries and assistance). Perhaps you can expand on your justification for this—are these dangerous capabilities going to be first exhibited in the real world by your agent running at T=0?
Considering the abilities of model-human hybrids also seems valuable. ARA agents may be created an AI engineer using their model to improve itself. Ultimately, what matters is that you end up with recursive self-improvement, not that the model didn’t do A-Z by itself.
Thanks for clarifying, I did actually read the report and the task specifications before running the experiments and commenting.
Thanks for the response! I think you make a good summary of the issues I have with this report. You evaluate “does our agent definitely do the thing” whereas I think the important question is “can any agent ever do the thing” (within a reasonably number of tries and assistance). Perhaps you can expand on your justification for this—are these dangerous capabilities going to be first exhibited in the real world by your agent running at T=0?
Considering the abilities of model-human hybrids also seems valuable. ARA agents may be created an AI engineer using their model to improve itself. Ultimately, what matters is that you end up with recursive self-improvement, not that the model didn’t do A-Z by itself.
Thanks for clarifying, I did actually read the report and the task specifications before running the experiments and commenting.