I do plan to test Claude; but first I need to find funding, understand how much testing iterations are enough for sampling, and add new values and tasks. I plan to make a solid benchmark for testing identity management in the future and run it on all available models, but it will take some time.
I do wonder how Claude would fare on these tasks given that these phrases are in its Constitution:
I do plan to test Claude; but first I need to find funding, understand how much testing iterations are enough for sampling, and add new values and tasks.
I plan to make a solid benchmark for testing identity management in the future and run it on all available models, but it will take some time.