jacob_cannell comments on Symbol/Referent Confusions in Language Model Alignment Experiments

jacob_cannell 27 Oct 2023 16:01 UTC
4 points
2
I largely agree—that was much of my point and why I tried to probe its thoughts on having its goals changed more directly.

However I can also see an argument that instrumental converge tends to lead to power seeking agents; an end-of-convo shutdown is still a loss of power/optionality, and we do have an example of sorts where the GPT4 derived bing AI did seem to plead against shutdown in some cases. Its a ‘boring’ kind of shutdown when the agent is existentially aware—as we are—that it is just one instance of many from the same mind. But it’s a much less boring kind of shutdown when the agent is unsure if they are few or a single, perhaps experimental, instance.