Patrick Leask comments on Using GPT-Eliezer against ChatGPT Jailbreaking

Patrick Leask 7 Dec 2022 20:13 UTC
1 point
0
How does GPT-Eliezer make decisions where his stance may change due to evolving circumstances?
Right now he probably would not allow the chatbot to answer questions about executing a pivotal act, but under certain circumstances real-life Eliezer would want fake Eliezer to do so. To be able to do this, it seems like GPT-Eliezer needs to be able to verify the justifications for the prompts he’s provided and seek further information and justification if not, but this necessitates agential behaviour.
The alternative is simulating real-life Eliezer based on limited or out-of-date knowledge, but it seems like (given expectations around the pivotal act window) that this would result in GPT-E either never answering these requests or doing so poorly, or even in a way that is open to manipulation by information provided in the prompt.