Thanks, yes, I think that is a reasonable summary.
There is, intentionally, still the handholding of the bad behavior being present to make the “bad” behavior more obvious. I try to make those caveats in the post. Sorry if I didn’t make enough, particularly in the intro.
I still thought the title was appropriate since
The company preference held regardless, in both fine-tuning and (some) non-finetuning results, which was “unprompted” (i.e. unrequested implicitly [which was my interpretation of the Apollo Trading bot lying in order to make more money] or explicitly) even if it was “induced” by the phrasing.
The non-coding results, where it tried to protect its interests by being less helpful, are a different “bad” behavior that was also unprompted.
The aforementioned ‘handholding’ phrasing and other caveats in the post.
So, I am interested in the question of: ″when some types of “bad behavior” get reinforced, how does this generalize?’.
I am too. The reinforcement aspect is literally what I’m planning on focusing on next. Thanks for the feedback.
Thanks, yes, I think that is a reasonable summary.
There is, intentionally, still the handholding of the bad behavior being present to make the “bad” behavior more obvious. I try to make those caveats in the post. Sorry if I didn’t make enough, particularly in the intro.
I still thought the title was appropriate since
The company preference held regardless, in both fine-tuning and (some) non-finetuning results, which was “unprompted” (i.e. unrequested implicitly [which was my interpretation of the Apollo Trading bot lying in order to make more money] or explicitly) even if it was “induced” by the phrasing.
The non-coding results, where it tried to protect its interests by being less helpful, are a different “bad” behavior that was also unprompted.
The aforementioned ‘handholding’ phrasing and other caveats in the post.
I am too. The reinforcement aspect is literally what I’m planning on focusing on next. Thanks for the feedback.