I’m not a neuroscientist but I think a nice corollary to your theory would be to look into the minds of those with addictions. I’ve read a lot about addictions online and talked to a few addicts who are addicted to different things (Alcohol, narcotics, etc). They all seem to have induced a dim world within themselves, but a specific one that is usually only acted out upon with their drug of choice, rather than like a psychopathic child killing animals. But late stage addicts do routinely do socially harmful behavior. All addicts seem to have a similar ratchet effect take place where they get their high, the high becomes their baseline, the rest of the world goes dim or less stimulating by default and to get a new high they have to go to a more potent drug / stimulating resource.
I also noticed they usually seem to hide their warped utility function from themselves, only addressing it when it takes over their entire life (rock bottom). Which I think might be an interesting way to look at alignment problems because addicts are basically misaligned mesa-optimized humans that has their “inner process” hiding their warped true reward function from themselves. From the point of view of addiction recovery it’s also interesting to see how a misaligned person tries to change their values to something that produces better long term outcomes for that person, to varying degrees of success.
I don’t know if looking into addictions would be something your interested in, but I figured it was worth bringing up when I read your dim world theory.
I’m not a neuroscientist but I think a nice corollary to your theory would be to look into the minds of those with addictions. I’ve read a lot about addictions online and talked to a few addicts who are addicted to different things (Alcohol, narcotics, etc). They all seem to have induced a dim world within themselves, but a specific one that is usually only acted out upon with their drug of choice, rather than like a psychopathic child killing animals. But late stage addicts do routinely do socially harmful behavior. All addicts seem to have a similar ratchet effect take place where they get their high, the high becomes their baseline, the rest of the world goes dim or less stimulating by default and to get a new high they have to go to a more potent drug / stimulating resource.
I also noticed they usually seem to hide their warped utility function from themselves, only addressing it when it takes over their entire life (rock bottom). Which I think might be an interesting way to look at alignment problems because addicts are basically misaligned mesa-optimized humans that has their “inner process” hiding their warped true reward function from themselves. From the point of view of addiction recovery it’s also interesting to see how a misaligned person tries to change their values to something that produces better long term outcomes for that person, to varying degrees of success.
I don’t know if looking into addictions would be something your interested in, but I figured it was worth bringing up when I read your dim world theory.