David Scott Krueger (formerly: capybaralet) comments on I’m planning to start creating more write-ups summarizing my thoughts on various issues, mostly related to AI existential safety. What do you want to hear my nuanced takes on?

David Scott Krueger (formerly: capybaralet) 3 Oct 2022 10:57 UTC
3 points
0
ERO is another one of those things that I think has a lot of precedent in ML (There’s a paper I was trying to dig up using natural language as the latent space in a variational autoencoder), but doesn’t look very promising to me because of “steganography everywhere”. Like other approaches to interpretability, it seems worth pursuing, but I also worry that people believe too much in flawed interpretations.

Shard theory sounds like an interesting framing, and again something that a lot of people would already agree with in ML, I’m not sure what it is supposed to be useful for or what sort of testable predictions it makes. Seems like a perspective worth keeping in mind, But I’m not sure I’d call it a research agenda.

RAT: I don’t see any way to “search over something like ‘the model’s beliefs about what it has seen’”; This seems like a potential sticking point; There’s more foundational research needed to figure out if when how we can even ascribe beliefs to a model etc.

As a general comment, I think a lot of the “agendas” That people discuss here are not very well fleshed out, And the details are crucially important. I’m not even sure whether to call a lot of these ideas “agendas”. To need they strike me as more like “framings”. It is important to note that the ML community doesn’t publish “framings”, except when they can be supported by concrete results (You can sometimes find people giving their perspective or framing on some problems in machine learning in blogs or keynotes or tutorials, etc.). So I think that people here often overestimate the novelty of their perspective. I think it is good to reward people for sharing these things, but given that a lot of other people might have similar thoughts but choose not to share them, I don’t think people here have quite the right attitude towards this. Writing up or otherwise communicating such framings without a strong empirical or theoretical contribution and expecting credit for the ideas / citations of your work would be considered “flag planting” in machine learning. Probably the best would be some sort of middle ground.
- Jérémy Scheurer 12 Oct 2022 21:10 UTC
  1 point
  0
  Parent
  ERO: I do buy the argument of Steganography everywhere if you are optimizing for outcomes. As described here (https://www.lesswrong.com/posts/pYcFPMBtQveAjcSfH/supervise-process-not-outcomes) outcome-based optimization is an attractor and will make your sub-compoments uninterpretable. While not guaranteed, I do think that process based optimization might suffer less from steganography (although only experiments will eventually show what happens). Any thoughts on process based optimization?
  Shard Theory: Yeah, the word research agenda was maybe wrongly picked. I was mainly trying to refer to research directions/frameworks.
  
  RAT: Agree at the moment this is not feasible.
  
  See above, I don’t have strong views on how to call this. Probably for some things research agenda might be too strong of a word. I appreciate your general comment, this is helpful in better understanding your view on Lesswrong vs. for example peer-reviewing. I think you are right to some degree. There is a lot of content that is mostly about framing and does not provide concrete results. However, I think that sometimes a correct framing is needed for people to actually come up with interesting results, and for making things more concrete. Some examples I like for example are the inner/outer alignment framing (which I think initially didn’t bring any concrete examples), or the recent Simulators (https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators) post. I think in those cases the right framing helps tremendously to make progress with concrete research afterward. Although I agree that grounded, concrete, and result-oriented experimentation is indeed needed to make concrete progress on a problem. So I do understand your point, and it can feel like flag planting in some cases.
  
  Note: I’m also coming from academia, so I definitely understand your view and share it to some degree. However, I’ve personally come to appreciate some posts (usually by great researchers) that allow me to think about the Alignment Problem in a different way.
  I read “Film Study for Research” just the other day (https://bounded-regret.ghost.io/film-study/, recommended by Jacob Steinhardt). In retrospect I realized that a lot of the posts here give a window into the rather “raw & unfiltered thinking process” of various researchers, which I think is a great way to practice research film study.
  - David Scott Krueger (formerly: capybaralet) 16 Oct 2022 2:46 UTC
    3 points
    0
    Parent
    My understanding is that process-based optimization is just another name for supervising intermediary computations—you can treat anything computed by a network as an “output” in the sense of applying some loss function.
    
    So (IIUC), it is not qualitatively different.