tailcalled comments on the case for CoT unfaithfulness is overstated

tailcalled 2 Oct 2024 17:12 UTC
2 points
0
My threat model is entirely different: Even if human organizations are misaligned today, the human organizations rely primarily on human work, and so they pass tons of resources and discretion on to humans, and try to ensure that the humans are vigorous.
Meanwhile, under @mattmacdermott ’s alignment proposal, one would get rid of the humans and pass tons of resources and discretion on to LLMs. Whether one values the LLMs is up to you, but if human judgement and resource ownership is removed, obviously that means humans lose control, unless one can change organizations to give control to non-employees instead of employees (and it’s questionable how meaningful that is).