i’ve recently done more AI agents running amok and i’ve found Claude was actually more aligned and did stuff i asked it not to much less than oai models enough that it actaully made a difference lol
lol what? Can you compile/summarize a list of examples of AI agents running amok in your personal experience? To what extent was it an alignment problem vs. a capabilities problem?
not running amock, just not reliably following instructions “only modify files in this folder” or “don’t install pip packages”. Claude follows instructions correctly, some other models are mode collapsed into a certain way of doing things, eg gpt-4o always thinks it’s running python in chatgpt code interpreter and you need very strong prompting to make it behave in a way specific to your computer
a hypothetical typical example would be it tries to use the file /usr/bin/python because it’s memorized that that’s the path to python, that fails, then it concludes it must create that folder which would require sudo permissions, if it can it could potentially mess something
i’ve recently done more AI agents running amok and i’ve found Claude was actually more aligned and did stuff i asked it not to much less than oai models enough that it actaully made a difference lol
lol what? Can you compile/summarize a list of examples of AI agents running amok in your personal experience? To what extent was it an alignment problem vs. a capabilities problem?
not running amock, just not reliably following instructions “only modify files in this folder” or “don’t install pip packages”. Claude follows instructions correctly, some other models are mode collapsed into a certain way of doing things, eg gpt-4o always thinks it’s running python in chatgpt code interpreter and you need very strong prompting to make it behave in a way specific to your computer
a hypothetical typical example would be it tries to use the file /usr/bin/python because it’s memorized that that’s the path to python, that fails, then it concludes it must create that folder which would require sudo permissions, if it can it could potentially mess something