Gerald Monroe comments on Current AIs Provide Nearly No Data Relevant to AGI Alignment

Gerald Monroe 16 Dec 2023 0:35 UTC
2 points
0
IMO, there isn’t anything which strongly rules out LLM agents being overall quite powerful while still having weak forward passes. In particular, weak enough that they can’t do non-trivial consequentialist reasoning in a forward pass.
Why not control the inputs more tightly/choose the response tokens at temperature=0?
Example:
Prompt A : Alice wants in the door
Promt B: Bob wants in the door
Available actions: 1. open, 2. keep_locked, 3. close_on_human

I believe you are saying with a weak forward pass the model architecture would be unable to reason “I hate Bob and closing the door on Bob will hurt Bob”, so it cannot choose (3).
But why not simply simplify the input? Model doesn’t need to know the name.
Prompt A: <entity ID_VALID wants in the door>
Prompt B: <entity ID_NACK wants in the door>

Restricting the overall context lets you use much more powerful models you don’t have to trust, and architectures you don’t understand.