Next step would be to try it on Claude, and on o1-mini/preview (the iterative revising should work for both, like it did with my Rubik’s Cube exercise). If you are in the base model adequately, then you should be in Llama-3-405b-base as well, and that’s available through a few APIs now, I believe, and you may find it to work a lot better if you can get the prompt right—several of your complaints like unsettlingness, groupthink, jokes, or indirection are characteristic of mode-collapsed tuned models but not base models.
Next step would be to try it on Claude, and on o1-mini/preview (the iterative revising should work for both, like it did with my Rubik’s Cube exercise). If you are in the base model adequately, then you should be in Llama-3-405b-base as well, and that’s available through a few APIs now, I believe, and you may find it to work a lot better if you can get the prompt right—several of your complaints like unsettlingness, groupthink, jokes, or indirection are characteristic of mode-collapsed tuned models but not base models.