It basically comes down to the fact that agents using too smart decision theories like FDT or UDT can fundamentally be deceptively aligned, even if myopia is retained by default.
That’s the problem with one-boxing in Newcomb’s problem, because it implies that our GPTs could very well become deceptively aligned.
One-boxing on Newcomb’s Problem is good news IMO. Why do you believe it’s bad?
It basically comes down to the fact that agents using too smart decision theories like FDT or UDT can fundamentally be deceptively aligned, even if myopia is retained by default.
That’s the problem with one-boxing in Newcomb’s problem, because it implies that our GPTs could very well become deceptively aligned.
Link below:
https://www.lesswrong.com/posts/LCLBnmwdxkkz5fNvH/open-problems-with-myopia
The LCDT decision theory does prevent deception, assuming it’s implemented correctly.
Link below: