Rohin Shah comments on [AN #80]: Why AI risk might be solved without additional intervention from longtermists

Rohin Shah 6 Jan 2020 17:47 UTC
LW: 2 AF: 2
AF
Due to game theoretical stuff, the order in which we do things may matter (e.g. due to commitment races in logical time).
Can you give me an example? I don’t see how this would work.
(Tbc, I’m imagining that the universe stops, and only I continue thinking; there are no other agents thinking while I’m thinking, and so afaict I should just implement UDT.)
- Ofer 8 Jan 2020 0:09 UTC
  LW: 1 AF: 1
  AF Parent
  Creating some sort of commitment device that would bind us to follow UDT—before we evaluate some set of hypotheses—is an example for one potentially consequential intervention.
  
  As an aside, my understanding is that in environments that involve multiple UDT agents, UDT doesn’t necessarily work well (or is not even well-defined?).
  
  Also, if we would use SGD to train a model that ends up being an aligned AGI, maybe we should figure out how to make sure that that model “follows” a good decision theory. (Or does this happen by default? Does it depend on whether “following a good decision theory” is helpful for minimizing expected loss on the training set?)