Still haven’t heard a better suggestion than CEV.
TristanTrim
Could you reformulate the last paragraph
I’ll try. I’m not sure how your idea could be used to define human values. I think your idea might have a failure mode around places where people are dissatisfied with their current understanding. I.e. situations where a human wants a more articulate model of the world then they have.
The post is about corrigible task ASI
Right. That makes sense. Sorry for asking a bunch of off topic questions then. I worry that task ASI could be dangerous even if it is corrigible, but ASI is obviously more dangerous when it isn’t corrigible, so I should probably develop my thinking about corrigibility.
Since we’re speculating about programmer culture I’ll bring up the jargon file which describes some hacklish jargon from the early days of computer hobbyists. I think it’s safe to say these kinds of people do not in general like beauty and elegance of computer systems sacrificed for “business interests”, whether or not that includes a political counter cultural attitude.
It could be a lot of programmer disdain for “suits” is traced back to those days, but I’m honestly not sure how niche that culture has become in eternal september. For more context see “Hackers: Heroes of the Computer Revolution” or anything else written by Steven Levy.
AI schmoozes everyone ;^p
Hmm… I appreciate the response. It makes me more curious to understand what you’re talking about.
At this point I think it would be quite reasonable if you suggest that I actually read your article instead of speculating about what it says, lol, but if you want to say anything about my following points of confusion I wouldn’t say no : )
For context my current view is that value alignment is the only safe way to build ASI. I’m less skeptical about corrigible task ASI than prosaic scaling with RLHF, but I’m currently still quite skeptical in absolute terms. Roughly speaking, prosaic kills us, task genie maybe kills us maybe allows us to make stupid wishes which harm us. I’m kinda not sure if you are focusing on stuff that takes us from prosaic from to task genie, or that helps with task genie not killing us. I suspect you are not focused on task genie allowing us to make stupid wishes, but I’d be open to hearing I’m wrong.
I also have an intuition that having preferences for future preferences is synonymous with having those preferences, but I suppose there are also ways in which they are obviously different, ie their uncompressed specification size. Are you suggesting that limiting the complexity of the preferences the AI is working off of to similar levels of complexity of current encodings of human preferences (ie human brains) ensures the preferences aren’t among the set of preferences that are misaligned because they are too complicated (even though the human preferences are synonymous with more complicated preferences). I think I’m surely misunderstanding, maybe the way you are applying the natural abstraction hypothesis, or possibly a bunch of things.
Hot take: That would depend on if by doing so it is acting in the best interest of humankind. If it does so just because it doesn’t really like people and would be happy to call us useless and see us gone, then I say misaligned. If it does so because in it’s depth of understanding human nature it sees that humanity will flourish under such conditions and its true desire is human flourishing… then maybe it’s aligned, depending on what is meant by “human flourishing”.
My speculation: It’s tribal arguments as soldiers mentality. Saying something bad (peoples mental health is harmed) about something from “our team” (people promoting awareness of AI x-risk) is viewed negatively. Ideally people on lesswrong know not to treat arguments as soldiers and understand that situations can be multi-faceted, but I’m not sure I believe that is the case.
Two more steel man speculations:
Currently promoting x-risk is very important and people focused on AI Alignment are an extreme minority, so even though it is true that people learning that the the future is in threat causes distress, it is important to let people know. But I note that this perspective shouldn’t limit discussion of how to promote awareness of x-risk while also promoting good emotional well being.
So, my second steel man: You didn’t include anything productive, such as pointing to Mental Health and the Alignment Problem: A Compilation of Resources.
Fwiw, I would love for people promoting AI x-risk awareness to be aware and careful about how the message affects people, and promote resources for peoples well being, but this seems comparably low priory. Currently in computer science there is no obligation for people to swear an oath of ethics like doctors and engineers do, and papers are only obligated to speculate on the benefits of the contents of the paper, not the ethical considerations. It seems like the mental health problems computer science in general are causing, especially social media and AI chatbots, are worse than people hearing that AI is a threat.
So even if I disagree with you, I do value what your saying and think it deserves an explanation, not just downvoting.
I think they are delaying so people can early pre order which affects how many books the publisher prints and distributes which affects how many people ultimately read it and how much it breaks into the Overton window. Getting this conversation mainstream is an important instrumental goal.
If you are looking for info in the mean time you could look at PauseAI:
Or if you want less facts and quotes and more discussion, I recall that Yudkowsky’s Coming of Age is what changed my view from “orthogonality kinda makes sense” to “orthogonality is almost certainly correct and the implication is alignment needs more care than humanity is currently giving it”.
You may also be better discussing more with your friend or the various online communities.
You can also preorder. I’m hopeful that none of the AI labs will destroy the world before the books release : )
Thanks for responding : )
A is amusing, definitely not what I was thinking. B seems like it is probably what I was thinking, but I’m not sure, and don’t really understand how having a different metric of simplicity changes things.
While the true laws of physics can be arbitrarily complicated, the behavior of variables humans care about can’t be arbitrarily complicated.
I think this is the part that prompted my question. I may be pretty far off of understanding what you are trying to say, but my thinking is basically that I am not content with the capabilities of my current mind, so I would like to improve it, but in doing so I would be capable of having more articulate preferences, and my current preference would define a function from the set of possible preferences to an approval rating such that I would be trying to improve my mind in such a way that my new more articulate preferences are the ones I most approve of or find sufficiently acceptable.
If this process is iterated, it defines some path or cone from my current preferences through the space of possible preferences moving from less to more articulate. It might be that other people would not seek such a thing, though I suspect many would, but with less conscientiousness about what they are doing. It is also possible there are convergent states where my preferences and capabilities would determine a desire to remain as I am. ( I am mildly hopeful that that is the case. )
It is my understanding that the mandelbrot set is not smooth at any scale (not sure if anyone has proven this), but that is the feature I was trying to point out. If people iterativly modified themselves, would their preferences become ever more exacting? If so, then it is true that the “variables humans care about can’t be arbitrarily complicated”, but the variables humans care about could define a desire to become a system capable of caring about arbitrarily complicated variables.
I have stumbled on this post a decade later...
As a fan of Yudkowsky’s ideas about the technical alignment problem with deep respect for the man I must say I find this posts irreverence very funny and worthwhile. I don’t think we should expect perfection from any human, but I like to think that Eliezer and his fans should feel at least a twinge of guilt for creating a zeitgeist that would inspire this post.
Now if it had been Eliezer, Connor Leahy, and Roman Yampolskiy all on the track I might have to start forming Fermi estimates about how many people it would take to slow down or derail the trolly, our lightcone and other strange counterfactual of this moral dilemma.
Interesting!
Humans are complicated multi goal systems, where the endocrine system is constantly tweaking motivation for the conscious planning stuff. It is a mess, but it is in part because of the high recalcitrance of biologically evolved systems. An ASI would be an engineered system and so would necessarily have at least slightly lower recalcitrance, even if it is a mess of prosaic deep learning.
What this implies is that the optimization power of the ASI could be applied to improving it’s coherence more competently than for a human, both because the ASI has more optimization power, and because it requires less optimization power to cohere an engineered system.
The scenario [misaligned superintelligence] seems to have a popularity out of proportion to its plausibility
Do you think there might be a reason for that? Do you think it might be that misaligned superintelligence represents a trap in our reality, and so even if it is unlikely, it is very important to avoid? ( Sorry for the sardonic Socratic question 😅 )
I apologize, I didn’t read in full, but I’m curious if you considered the case of, for example, the Mandelbrot set? A very simple equation specifies an infinitely precise, complicated set. If human values have this property than it would be correct to say the Kolmogorov complexity of human values is very low, but there are still very exacting constraints on the universe for it to satisfy human values.
It seems you might be worried the concept “accountability sink” could be use to excuse crimes that should not be excused. I’ll suggest that if improperly applied, they could be used for that, but if properly applied they are more beneficial than this risk.
In your earlier comment you suggested that what is being said here is “Oh the Holocaust is just an accountability sink”. That suggests you may be thinking of this in a True/False way rather than a complex system dynamics kind of way. I don’t think anyone here agrees with “the Holocaust was an accountability sink” but rather, people would agree with “there were accountability sinks in the Holocaust, without which, more people would have more quickly resisted rather than following orders”.
I think you can view punishment at least two ways. (a) As a way of hurting those who deserve to be hurt because they are bad. (b) As a way to signal to other people who would commit such a crime that they should not because the outcome for them will be bad.
I can’t fault fault people who feel a desire for (a), but I feel it should be viewed as perverse, like children who enjoy feeling powerful by playing violent video games, these are not the people who I want as actual soldiers.
I feel (b) is a reasonable part our goal “prevent bad things from happening”. But people are only so influenced by fear of punishment. They may still defect if: - they think they can get away with it, - they are sufficiently desperate, - they believe what they are doing is ideologically right. So if we want to go further with influencing those actors we need to understand those cases, each of which I think includes some form of not thinking what they are doing is bad, and accountability sinks may form a part of that.
You may be concerned that focus on accountability sinks will lead to letting those who should be punished off the hook, but we could flip that. Maybe we punish everyone who has any involvement with an accountability sink because they should have known better. I am currently poorer than I would have been if I had been more willing to engage with the nebulously evil society I was born into. I would feel some vindication if, for example, everyone who bought designer clothes manufactured in sweat shops was charged with a crime. I don’t think this is going to happen for practical reasons, but I think your impression that “sink” implies that we actually won’t hold people accountable is wrong, it is more that people in these situations don’t feel accountable and it’s hard to tell who actually is accountable. I think “everyone is accountable when engaging with an accountability sink” is a reasonable perspective.
Looking at “accountability sinks” is good for predicting when people might engage in mass harmful systems. Predicting this is good since we want to prevent it, and educating people to watch out for accountability sinks because if you willingly engage with an accountability sink you are accountable and will be tried as such.
Note that this does have implications for capitalism / market based society. There are many products that don’t have a 3rd party certificate showing it was audited and isn’t making use of accountability sinks to benefit from criminal things like illegal working conditions or compensation, or improper waste disposal. Buying such products should rightly be illegal. But unfortunately this will raise the cost of legal products forcing many people who are currently near the poverty line below it. This is not something that should be taken lightly either.
sometimes, line employees can see a policy-caused disaster brewing right in front of their faces. And they can prevent it by violating policy. And they should! It’s good to do that!
I really like this. Agreed.
Slack is good, and ideally we would have plenty for everyone, but Moloch is not a fan.
I feel like your pov includes a tacit assumption that if there are problems, somewhere there is somebody who, if they had better competence or moral character, could have prevented things from being so bad. I am a fan of Tsuyoku naritai, and I think it applies to ethics as well… I want to be stronger, more skilled and more kind. I want others to want this too. But I also want to acknowledge that, when honestly looking for blame, sometimes it may rest fully in someones character, but sometimes (and I suspect many or most times) the fault exists in systems and our failures of forethought and failures to understand the complexities of large multi state systems and the difficult ambiguity in communication. It is also reasonable to assume both can be at fault.
Something that may be driving me to care about this issue… it seems much of the world today is out for blood. Suffering and looking to identify and kill the hated outgroup. Maybe we have too much population and our productivity can’t keep up. Maybe some people need to die. But that is awful, and I would rather we sought our sacrifices with sorrow and compassion than the undeserving bitter hatred that I see.
I believe we very well could be in a world where every single human is good, but bad things still happen anyway.
Yes, I think this is exactly what I am thinking of, but with the implied generality that it applies to all domains, not specifically software as in the article examples, and also with my suggested answers to why it happens not being to avoid responsibility, but rather the closer you are to the bottom of the hierarchy, the more likely you are to be understaffed and overworked in a way that makes it logistically more difficult to spend time thinking about policy, and also to appear more agreeable to management.
For bonus points, revisit your predictions afterwards
I feel less bad with my track record of revisiting predictions since you phrase it as bonus points. I really do hope to someday keep a prediction journal which I review.
In the rest of this comment, I’m talking in detail about some of my mental models of the Holocaust. If you think it might be more upsetting to you than helpful, please don’t read it.
It seems like you might be pointing out that there is a significant difference in the badness of hundreds of thousands of people being systematically murdered and a web server going down… which should be obvious, but I’m not sure about your actual critique of the concept of an “accountability sink”. The concept seems important and valuable to me.
I recall reading about how gas chambers were designed so that each job was extremely atomized, allowing each person to rationalize their culpability. One person just lead people into a room and shut the door. Another separate person loaded a chamber of poison, someone else just operates the button that releases that poison. A completely different door is used for people who remove the bodies ensuring that the work teams dealing with putting living people in are separate from those working on taking dead people out. It really does seem like the process was well designed from the perspective of an accountability sink and understanding that seems meaningful.
I think there’s an emperors new clothes effect in chains of command. In every layer, the truth is altered slightly to make things appear a justifiable amount better than they really are, but because there can be so many layers of indirection in the operation of and adherence to policy, the culture can look really different depending on where you find yourself in the class hierarchy. This is especially true with thinking things through and questioning orders. I think people in roles to make policy are often far removed from the mentality that must be adopted to operate in the frantic, understaffed efficiency of front line workers carrying out policy.
“There is nothing that can force you to do something you know is wrong” seems like a very affluent pov. More working class families might suggest advice more like “lower your expectations to lower your stress”. I don’t know your background though. Do let me know if I’m misunderstanding you.
I read this more like a textbook article and less like a persuasive essay (which are epistemologically harmful imo) so the goal may have been to provide diverse examples, rather than examples which lead you to a predetermined conclusion.
That’s pretty messed up. This thread is a good examination of the flaws of blame vs blamelessness.
I wonder if we could somehow promote the idea that “outing yourself as incompetent or malevolent” is heroic and venerable. Like with penetration testers, if you could, as an incompetent or malevolent actor, get yourself into a position of trust, then you have found a flaw in the system. If people got big payouts and maybe a little fame if they wanted it for saying. “Oh hey, I’m in this important role, and I’m a total fuck up. I need to be removed from this role”, that might promote them doing so, even if they are malevolent but especially if they are incompetent.
Possible flaws are that this would incentivize people to defect sometimes, pretending to be incompetent or malevolent, which is a form of malevolence, but this could get out of control. Also people would be more incentivized to try to get into roles they shouldn’t be in, but as with crypto, I’d rather have lots of people trying to break the system to demonstrate it’s strength than security through obscurity.
Evokes thought of the turnip economy, gold economy, and wish economy...
Essentially, wizards are indeed weak in that the number of worthwhile spells a wizard can cast is measured in spells per decade. Want to use alteration magic to craft a better toothbrush? How many months are you willing to work on it… with that much effort, the economies of scale strongly suggest you should not make one, but a plan… a spell formula that others can cast many times to replicate the item.
It is nice to seek general knowledge, but the skill to actually make use of that knowledge in spellcasting is difficult to attain, and even if you succeed, the number of spells you can cast is still limited by the natural difficulty.
It seems what you want is not just orienting away from kings and towards wizards… I share that value, and it would be nice if more kings were themselves wizards… but more than that, you want more powerful wizards. You want it to be faster to cast better spells. Maybe I am projecting… for that is certainly what I want.