Vlad Mikulik comments on Risks from Learned Optimization: Introduction

Vlad Mikulik 8 Jun 2019 18:15 UTC
LW: 10 AF: 4
AF
I think humans are fairly weird because we were selected for an objective that is unlikely to be what we select for in our AIs.

That said, if we model AI success as driven by model size and compute (with maybe innovations in low-level architecture), then I think that the way humans represent objectives is probably fairly close to what we ought to expect.

If we model AI success as mainly innovative high-level architecture, then I think we will see more explicitly represented objectives.

My tentative sense is that for AI to be interpretable (and safer) we want it to be the latter kind, but given enough compute the former kind of AI will give better results, other things being equal.

Here, what I mean by low-level architecture is something like “we’ll use lots of LSTMs instead of lots of plain RNNs, but keep the model structure simple: plug in the inputs, pass it through some layers, and read out the action probabilities”, and high-level is something like “let’s organise the model using this enormous flowchart with all of these various pieces that each are designed to take a particular role; here’s the observation embedding, here’s the search in latent model space, here’s the …”