In case anyone stumbles across this post in the future, I found these posts from the past both arguing for and against some of the worries I gloss over here. I don’t think my post boils down completely to merely “recommender systems should be better aligned with human interests”, but that is a big theme.
Joe Kwon
I’m also not sold on this specific part, and I’m really curious about what things support the idea. One reason I don’t think it’s good to rely on this as the default expectation though, is that I’m skeptical about humans’ abilities to even know what the “best experience” is in the first place. I wrote a short rambly post touching on, in some part, my worries about online addiction: https://www.lesswrong.com/posts/rZLKcPzpJvoxxFewL/converging-toward-a-million-worlds
Basically, I buy into the idea that there are two distinct value systems in humans. One subconscious system where the learning is mostly from evolutionary pressures, and one conscious/executive system that cares more about “higher-order values” which I unfortunately can’t really explicate. Examples of the former: craving sweets, addiction to online games with well engineered artificial fulfillment. Example of the latter: wanting to work hard, even when it’s physically demanding or mentally stressful, to make some type of positive impact for broader society.
And I think today’s modern ML systems are asymmetrically exploiting the subconscious value system at the expense of the conscious/executive value system. Even knowing all this, I really struggle to overcome instances of akrasia, controlling my diet, not drowning myself in entertainment consumption, etc. I feel like there should be some kind of attempt to level the playing field, so to speak, with which value system is being allowed to thrive. At the very least, transparency and knowledge about this phenomena to people who are interacting with powerful recommender (or just general) ML systems, and in the optimal, allowing complete agency and control over what value system you want to prioritize, and to what extent.
Very interesting post!
1) I wonder what your thoughts are on how “disentangled” having a “dim world” perspective and being psychopathic are (completely “entangled” being: all psychopaths experience dim world and all who experience dim world are psychopathic). Maybe I’m also packing too many different ideas/connotations into the term “psychopathy”.
2) Also, the variability in humans’ local neuronal connection and “long-range” neuronal connections seems really interesting to me. My very unsupported, weak suspicion is that perhaps there is a correlation between these ratios (or maybe the pure # of each), and the natural ability to learn information and develop expertise in a very narrow domain of things (music, math?) vs. develop big new ideas where the concepts are largely formed from cross-domain, interdisciplinary thinking. Do you have any thoughts on this? Depending on what we believe for this, what we believe for question 1) has some very interesting implications, I think?
3) Finally, I wonder if the lesswrong community has a higher rate of “dim world” perspective-havers (or “psychopaths in the narrowly defined sense of having lower thresholds for stimulation), than the base-rate of the general population.
Just a small note that your ability to contribute via research doesn’t go from 0 now, to 1 after you complete a PhD! As in, you can still contribute to AI Safety with research during a phd
Thanks for posting this! I was wondering if you might share more about your “isolation-induced unusual internal information cascades” hypothesis/musings! Really interested in how you think this might relate to low-chance occurrences of breakthroughs/productivity.
My original idea (and great points against the intuition by Rohin)
“To me, it feels viscerally like I have the whole argument in mind, but when I look closely, it’s obviously not the case. I’m just boldly going on and putting faith in my memory system to provide the next pieces when I need them. And usually it works out.”
This closely relates to the kind of experience that makes me think about language as post hoc symbolic logic fitting to the neural computations of the brain. Which kinda inspired the hypothesis of a language model trained on a distinct neural net being similar to how humans experience consciousness (and gives the illusion of free will).
So, I thought it would be a neat proof of concept if GPT3 served as a bridge between something like a chess engine’s actions and verbal/semantic level explanations of its goals (so that the actions are interpretable by humans). e.g. bishop to g5; this develops a piece and pins the knight to the king, so you can add additional pressure to the pawn on d5 (or something like this).
In response, Reiichiro Nakano shared this paper: https://arxiv.org/pdf/1901.03729.pdf
which kinda shows it’s possible to have agent state/action representations in natural language for Frogger. There are probably glaring/obvious flaws with my OP, but this was what inspired those thoughts.Apologies if this is really ridiculous—I’m maybe suggesting ML-related ideas prematurely & having fanciful thoughts. Will be studying ML diligently to help with that.
Thanks, I hadn’t thought about those limitations
For the basic features, I got used to navigating everything within a hour. I’ll be on the lookout for improvements to Roam or other note-taking programs like this
Really appreciated this post and I’m especially excited for post 13 now! In the past month or two, I’ve been thinking about stuff like “I crave chocolate” and “I should abstain from eating chocolate” as being a result of two independent value systems (one whose policy was shaped by evolutionary pressure and one whose policy is… idk vaguely “higher order” stuff where you will endure higher states of cortisol to contribute to society or something).
I’m starting to lean away from this a little bit, and I think reading this post gave me a good idea of what your thoughts are, but it’d be really nice to get confirmation (and maybe clarification). Let me know if I should just wait for post 13. My prediction is that you believe there is a single (not dual) generator of human values, which are essentially moderated at the neurochemical level, like “level of dopamine/serotonin/cortisol”. And yet, this same generator, due to our sufficiently complex “thought generator”, can produce plans and thoughts such as “I should abstain from eating chocolate” even though it would be a dopamine hit in the short-term, because it can simulate forward much further down the timeline, and believes that the overall neurochemical feedback will be better than caving into eating chocolate, on a longer time horizon. Is this correct?
If so, do you believe that because social/multi-agent navigation was essential to human evolution, the policy was heavily shaped by social world related pressures, which means that even when you abstain from the chocolate, or endure pain and suffering for a “heroic” act, in the end, this can all still be attributed to the same system/generator that also sometimes has you eat sugary but unhealthy foods?
Given my angle on attempting to contribute to AI Alignment is doing stuff to better elucidate what “human values” even is, I feel like I should try to resolve the competing ideas I’ve absorbed from LessWrong: 2 distinct value systems vs. singular generator of values. This post was a big step for me in understanding how the latter idea can be coherent with the apparent contradictions between hedonistic and higher-level values.