TekhneMakre comments on Optimization, speculations on the X and only X problem.

TekhneMakre 8 Jun 2021 5:53 UTC
1 point
AF
Well, a main reason we’d care about codespace distance, is that it tells us something about how the agent will change as it learns (i.e. moves around in codespace). (This is involving time, since the agent is changing, contra your picture.) So a key (quasi)metric on codespace would be, “how much” learning does it take to get from here to there. The if True: x() else: y() program is an unnatural point in codespace in this metric: you’d have to have traversed the both the distances from null to x() and from null to y(), and it’s weird to have traversed a distance and make no use of your position. A framing of the only-X problem is that traversing from null to a program that’s an only-Xer according to your definition, might also constitute traversing almost all of the way from null to a program that’s an only-Yer, where Y is “very different” from X.
- Donald Hobson 8 Jun 2021 10:54 UTC
  LW: 2 AF: 1
  AF Parent
  I don’t think that learning is moving around in codespace. In the simplest case, the AI is like any other non self modifying program. The code stays fixed as the programmers wrote it. The variables update. The AI doesn’t start from null. The programmer starts from a blank text file, and adds code. Then they run the code. The AI can start with sophisticated behaviour the moment its turned on.
  So are we talking about a program that could change from an X er to a Y er with a small change in the code written, or with a small amount of extra observation of the world?
  - TekhneMakre 9 Jun 2021 1:48 UTC
    1 point
    Parent
    To clarify where my responses are coming from: I think what I’m saying is not that directly relevant to your specific point in the post. I’m more (1) interested in discussing the notion of only-X, broadly, and (2) reacting to the feature of your discussion (shared by much other discussion) that you (IIUC) consider only the extensional (input-output) behavior of programs, excluding from analysis the intensional properties. (Which is a reasonable approach, e.g. because the input-output behavior captures much of what we care about, and also because it’s maybe easier to analyze and already contains some of our problems / confusions.)
    From where I’m sitting, when a program “makes an observation of the world”, that’s moving around in codespace. There’s of course useful stuff to say about the part that didn’t change. When we really understand how a cognitive algorithm works, it starts to look like a clear algorithm / data separation; e.g. in Bayesian updating, we have a clear picture of the code that’s fixed, and how it operates on the varying data. But before we understand the program in that way, we might be unable to usefully separate it out into a fixed part and a varying part. Then it’s natural to say things like “the child invented a strategy for picking up blocks; next time, they just use that strategy”, where the first clause is talking about a change in source code. We know for sure that such separations can be done, because for example we can say that the child is always operating in accordance with fixed physical law, and we might suspect there’s “fundamental brain algorithms” that are also basically fixed. Likewise, even though Solomonoff induction is always just Solomonoff induction plus data, it can be also useful to understand SI(some data) in terms of understanding those programs that are highly ranked by SI(some data), and it seems reasonable to call that “the algorithm changed to emphasize those programs”.