davidad comments on Worst-case thinking in AI alignment

davidad 27 Dec 2021 13:26 UTC
LW: 3 AF: 2
AF
Here’s the results of an abbreviated literature search for papers that bring quantile-case concepts into contact with contemporary RL and/or deep learning:
- Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning. Christoph Dann, Tor Lattimore, Emma Brunskill. NIPS 2017.
  - Defines a concept of “Uniform-PAC bound”, which is roughly when $p$ -quantile-case episodic regret scales polynomially in $1 / (1 - p)$ .
  - Proves that a Uniform-PAC bound implies:
    PAC bound
    Uniform high-probability regret bound
    Convergence to zero regret with high probability
  - Constructs an algorithm, UBEV, that has a Uniform-PAC bound
  - Empirically compares quite favorably to other algorithms with only PAC or regret bounds
- Policy Certificates: Towards Accountable Reinforcement Learning. Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill. ICML 2019.
  - Defines an even stronger concept of “IPOC bound”, which implies Uniform-PAC, and also outputs a certified per-episode regret bound along with each proposed action.
  - Constructs an algorithm ORLC that has an IPOC-bound
  - Empirically compares favorably to UBEV
- Revisiting Generalization for Deep Learning: PAC-Bayes, Flat Minima, and Generative Models. Gintare Dziugaite. December 2018 PhD thesis under Zoubin Ghahrmani.
- Lipschitz Lifelong Reinforcement Learning. Erwan Lecarpentier, David Abel, Kavosh Asadi, et al. AAAI 2021.
  - Defines a pseudometric on the space of all MDPs
  - Proves that the mapping from an MDP to its optimal Q-function is (pseudo-)Lipschitz
  - Uses this to construct an algorithm LRMax that can transfer-learn from past similar MDPs while also being PAC-MDP
- Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation. Jiafan He, Dongruo Zhou, Quanwaun Gu. NIPS 2021.
  - Constructs an algorithm FLUTE that has a Uniform-PAC bound with a certain linearity assumption on the structure of the MDP being learned.
- Beyond No Regret: Instance-Dependent PAC RL. Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson. August 2021 preprint.
- Learning PAC-Bayes Priors for Probabilistic Neural Networks. María Pérez-Ortiz, Omar Rivasplata, Benjamin Guedj, et al. September 2021 preprint.
- Tigheter Risk Certificates for Neural Networks. María Pérez-Ortiz, Omar Rivasplata, John Shawe-Taylor, Csaba Szepesvári. ICML 2021.
- PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees. Jonas Rothfuss, Vincent Fortuin, Martin Josifoski, Andreas Krause. ICML 2021.
- Michaël Trazzi 27 Dec 2021 22:22 UTC
  LW: 3 AF: 3
  AF Parent
  I would add How useful is quantilization for specification gaming? Ryan Carey