Gunnar_Zarncke comments on Gunnar_Zarncke’s Shortform

Gunnar_Zarncke 29 Apr 2023 0:38 UTC
12 points
What are the smallest world and model trained on that world such that
- the world contains the model,
- the model has a non-trivial reward,
- the representation of the model in the world is detailed enough that the model can observe its reward channel (e.g., weights),
- the model outputs non-trivial actions that can affect the reward (e.g., modify weights).
What will happen? What will happen if there are multiple such instances of the model in the world?
- Mitchell_Porter 29 Apr 2023 4:45 UTC
  4 points
  Parent
  This is a good question, but I think the answer is going to be a dynamical system with just a few degrees of freedom. Like a “world” which is just a perceptron turned on itself somehow.
  - Gunnar_Zarncke 29 Apr 2023 21:51 UTC
    4 points
    Parent
    That is the idea. I think we need to understand the dynamics of wire-heading better. Humans sometimes seem to fall prey to it, but not always. What would happen to AIs?
    Maybe we even need to go a step further and let the model model this process too.