Still haven’t heard a better suggestion than CEV.
TristanTrim
sometimes, line employees can see a policy-caused disaster brewing right in front of their faces. And they can prevent it by violating policy. And they should! It’s good to do that!
I really like this. Agreed.
Slack is good, and ideally we would have plenty for everyone, but Moloch is not a fan.
I feel like your pov includes a tacit assumption that if there are problems, somewhere there is somebody who, if they had better competence or moral character, could have prevented things from being so bad. I am a fan of Tsuyoku naritai, and I think it applies to ethics as well… I want to be stronger, more skilled and more kind. I want others to want this too. But I also want to acknowledge that, when honestly looking for blame, sometimes it may rest fully in someones character, but sometimes (and I suspect many or most times) the fault exists in systems and our failures of forethought and failures to understand the complexities of large multi state systems and the difficult ambiguity in communication. It is also reasonable to assume both can be at fault.
Something that may be driving me to care about this issue… it seems much of the world today is out for blood. Suffering and looking to identify and kill the hated outgroup. Maybe we have too much population and our productivity can’t keep up. Maybe some people need to die. But that is awful, and I would rather we sought our sacrifices with sorrow and compassion than the undeserving bitter hatred that I see.
I believe we very well could be in a world where every single human is good, but bad things still happen anyway.
Yes, I think this is exactly what I am thinking of, but with the implied generality that it applies to all domains, not specifically software as in the article examples, and also with my suggested answers to why it happens not being to avoid responsibility, but rather the closer you are to the bottom of the hierarchy, the more likely you are to be understaffed and overworked in a way that makes it logistically more difficult to spend time thinking about policy, and also to appear more agreeable to management.
For bonus points, revisit your predictions afterwards
I feel less bad with my track record of revisiting predictions since you phrase it as bonus points. I really do hope to someday keep a prediction journal which I review.
In the rest of this comment, I’m talking in detail about some of my mental models of the Holocaust. If you think it might be more upsetting to you than helpful, please don’t read it.
It seems like you might be pointing out that there is a significant difference in the badness of hundreds of thousands of people being systematically murdered and a web server going down… which should be obvious, but I’m not sure about your actual critique of the concept of an “accountability sink”. The concept seems important and valuable to me.
I recall reading about how gas chambers were designed so that each job was extremely atomized, allowing each person to rationalize their culpability. One person just lead people into a room and shut the door. Another separate person loaded a chamber of poison, someone else just operates the button that releases that poison. A completely different door is used for people who remove the bodies ensuring that the work teams dealing with putting living people in are separate from those working on taking dead people out. It really does seem like the process was well designed from the perspective of an accountability sink and understanding that seems meaningful.
I think there’s an emperors new clothes effect in chains of command. In every layer, the truth is altered slightly to make things appear a justifiable amount better than they really are, but because there can be so many layers of indirection in the operation of and adherence to policy, the culture can look really different depending on where you find yourself in the class hierarchy. This is especially true with thinking things through and questioning orders. I think people in roles to make policy are often far removed from the mentality that must be adopted to operate in the frantic, understaffed efficiency of front line workers carrying out policy.
“There is nothing that can force you to do something you know is wrong” seems like a very affluent pov. More working class families might suggest advice more like “lower your expectations to lower your stress”. I don’t know your background though. Do let me know if I’m misunderstanding you.
I read this more like a textbook article and less like a persuasive essay (which are epistemologically harmful imo) so the goal may have been to provide diverse examples, rather than examples which lead you to a predetermined conclusion.
That’s pretty messed up. This thread is a good examination of the flaws of blame vs blamelessness.
I wonder if we could somehow promote the idea that “outing yourself as incompetent or malevolent” is heroic and venerable. Like with penetration testers, if you could, as an incompetent or malevolent actor, get yourself into a position of trust, then you have found a flaw in the system. If people got big payouts and maybe a little fame if they wanted it for saying. “Oh hey, I’m in this important role, and I’m a total fuck up. I need to be removed from this role”, that might promote them doing so, even if they are malevolent but especially if they are incompetent.
Possible flaws are that this would incentivize people to defect sometimes, pretending to be incompetent or malevolent, which is a form of malevolence, but this could get out of control. Also people would be more incentivized to try to get into roles they shouldn’t be in, but as with crypto, I’d rather have lots of people trying to break the system to demonstrate it’s strength than security through obscurity.
I like the phrase “Trust Network” which I’ve been hearing here and there.
TRUST NO ONE seems like a reasonable approximation of a trust network before you actually start modelling a trust network. I think it’s important to think of trust not as a boolean value, not “who can I trust” or “what can I trust” but “how much can I trust this” and in particular, trust is defined for object-action pairs. I trust myself to drive places since I’ve learned how and done so many times before, but I don’t trust myself to pilot an airplane. Further, when I get on an airplane, I don’t personally know the pilot, yet I trust them to do something I wouldn’t trust myself to do. How is this possible? I think there is a system of incentives and a certain amount of lore which informs me that the pilot is trustworthy. This system which I trust to ensure the trustworthiness of the pilot is a trust network.
When something in the system goes wrong, maybe blame can be traced to people, maybe just to systems, but in each case, something in the system has gone wrong, it has trusted someone or some process that was not ideally reliable. That accountability is important for improving the system. Not because someone must be punished, but because, if the system is to perform better in the future, some part of it must change.
I agree with the main article that accountability sinks protect individuals from punishment for their failures are often very good. In a sense, this is what insurance is, which is a good enough idea that it is legally enforced for dangerous activities like driving. I think accountability sinks in this case paradoxically make people less averse to making decisions. If the process has identified this person as someone to trust with some class of decision, then that person is empowered to make those decisions. If there is a problem because of it, it is the fault of the system for having identified them improperly.
I wonder if anyone is modelling trust networks like this. It seems like I might be describing reliability engineering with bayes-nets. In any case, I think it’s a good idea and we should have more of it. Trace the things that can be traced and make subtle accountability explicit!
Sorry for replying to 6mo old post.
I think the statement is more about context or framing.
You can take a shard agent and produce a utility function to describe it’s values, and you could have AIXI maximize that function. So in that sense “shard agents” are a subset of agents AIXI can implement, but AIXI by itself is not a shard agent.
Note that it is not enough to say the utility function values multiple different things, that’s normal for a utility function, it has to specify that the agent acts to optimize different utility functions depending on the context which makes the utility function massively more complicated. This justifies having the distinction between an agent with a single utility function and an agent with context dependent utility.
And I would say, even if you have AIXI running with a sharded utility function, it is implementing a shard agent while itself being understood as a non-shard agent.
I guess one kind of critique I might expect for this is that it fails to connect with a strategy. Terminology is for communication within a community. What community do these recommendations apply to, and why? I’d like to write a post sometime exploring that. If you know of anyone exploring that sort of idea. Please let me know.
A more general critique may be on the value of making recommendations about terminology and language use. My intuition is that being conscientious about our language is important, but critical examination of that idea seems valid.
I think it’s worth distinguishing between what I’ll call “parallel SI” vs “collective SI”.
Parallel SI is when you have something more intelligent because it has a lot of the same intelligence in parallel. Strictly parallel SI would need to rely on random differences in decisions and shelling points since communication between threads would not be possible.
Collective SI requires parallel SI, but additionally has organization of the work being done by each intelligence. I think how far this concept can be pushed is unclear, but I don’t see any reason that sufficiently clever organization of human level intelligence couldn’t achieve depth or even speed SI.
The idea is that the interaction of many human level intelligences can be made to emulate the mind of a greater intelligence. This means the evolved organizational structure found in corporations could potentially be SI in ways you don’t get just from parallel SI.
✨ I just donated 71.12 USD (100 CAD 🇨🇦) ✨
I’d like to donate a more relevant amount but I’m finishing my undergrad and have no income stream… in fact, I’m looking to become a Mech Interp researcher (& later focus on agent foundations) but I’m not going to be able to do that if misaligned optimizers eat the world, so I support lightcone’s direction as I understand it (policy that promotes AI not killing everyone).
If anyone knows of good ways to fund myself as a MI researcher, ideally focusing on this research direction I’ve been developing, please let me know : )
WRT formatting, thanks I didn’t realise the markdown needs two new lines for a paragraph break.
I think CoT and its dynamics as it relates to review and RSI is very interesting & useful to be exploring.
Looking forward to reading the stepping stone and stability posts you linked. : )
Yes, you’ve written more extensively on this than I realized, thanks for pointing out other relevant posts, sorry for not having taken the time to find them myself, I’m trying to err more on the side of communication than I have in the past.
I think math is the best tool to solve alignment. It might be emotional, I’ve been manipulated and hurt by natural language and the people who prefer it to math and have always found engaging with math to be soothing or at least sobering. It could also be that I truly believe that the engineering rigor that comes with understanding something enough to do math to it is extremely worthwhile for building a thing of the importance we are discussing.
Part of me wants to die on this hill and tell everyone who will listen “I know its impossible but we need to find ways to make it possible to give the math people the hundred years they need because if we don’t then everyone dies so theres no point in aiming for anything less and its unfortunate because it means it’s likely we are doomed but that’s the truth as I see it.” I just wonder how much of that part of me is my oppositional defiance disorder and how much is my strategizing for best outcome.
I’ll be reading your other posts. Thanks for engaging with me : )
WRT “I don’t want his attempted in any light-cone I inhabit”, well, neither do I. But we’re not in charge of the light cone.
That really is a true and relevant fact, isn’t it? 😭
It seems like aligning humans really is much more of a bottleneck rn than aligning machines, and not because we are at all on track to align machines.
I think you are correct about the need to be pragmatic. My fear is that there may not be anywhere on the scale from “too pragmatic failed to actually align ASI” to “too idealistic, failed to engage with actual decision makers running ASI projects” where we get good outcomes. Its stressful.
The organized mind recoils. This is not an aesthetically appealing alignment approach.
Praise Eris!
No, but seriously, I like this plan with the caveat that we really need to understand RSI and what is required to prevent it first, and also I think the temptation to allow these things to open up high bandwidth channels to other modalities than language is going to be really really strong and if we go forward with this we need a good plan to resist that temptation and a good way to know when not to resist that temptation.
Also, I’d like it if this was though of as a step on the path to cyborgism/true value alignment, and not as a true ASI alignment plan on its own.
I was going to say “I don’t want this attempted in any light-cone I inhabit, but I realize theres a pretty important caveat. On it’s own, I think this is a doom plan, but if there was a sufficient push to understand RSI dynamics before and during, then I think it could be good.
I don’t agree that it’s “a better idea than attempting value alignment”, it’s a better idea than dumb value alignment for sure, but imo only skilled value alignment or self modification (no AGI, no ASI) will get us to a good future. But the plans aren’t mutually exclusive. First studying RSI, then making sufficiently non-RSI AGI with instruction following goals, then using that non-RSI AGI to figure out value alignment, probably using GSLK and cyborgism seems to me like a fine plan. At least it does at present date present time.
I like this post. I like goals selected from learned knowledge (GSLK). It sounds a lot like what I was thinking about when I wrote how-i-d-like-alignment-to-get-done. I plan to use the term GSLK in the future. Thank you : )
“we’ve done so little work on alignment that I think it might actually be more like additive, from 1% to 26% or 50% to 75% with ten extra years relative to the real current odds if we press ahead—which nobody knows.” 😭🤣 I really want “We’ve done so little work the probabilities are additive” to be a meme. I feel like I do get where you’re coming from.
I agree about pause concern. I also really feel that any delay to friendly SI represents an enormous amount of suffering that could be prevented if we got to friendly SI sooner. It should not be taken lightly. And being realistic about how difficult it is to align humans seems worthwhile. When I talk to math ppl about what work I think we need to do to solve this though, “impossible” or “hundreds of years of work” seem to be the vibe. I think math is a cool field because more than other fields, it feels like work from hundreds of years ago is still very relevant. Problems are hard and progress is slow in a way that I don’t know if people involved in other things really “get”. I feel like in math crowds I’m saying “no, don’t give up, maybe with a hundred years we can do it!” And in other crowds I’m like “c’mon guys, could we have at least 10 years, maybe?” Anyway, I’m rambling a bit, but the point is that my vibe is very much, “if the Russians defect, everyone dies”. “If the North Koreans defect, everyone dies”. “If Americans can’t bring themselves to trust other countries and don’t even try themselves, everyone dies”. So I’m currently feeling very “everyone slightly sane should commit and signal commitment as hard as they can” cause I know it will be hard to get humanity on the same page about something. Basically impossible, never been done before. But so is ASI alignment.
I haven’t read those links. I’ll check em out, thanks : ) I’ve read a few things by Drexler about, like, automated plan generation and then humans audit and enact the plan. It makes me feel better about the situation. I think we could go farther safer with careful techniques like that, but that is both empowering us and bringing us closer to danger, and I don’t think it scales to SI, and unless we are really serious about using it to map RSI boundaries, it doesn’t even prevent misaligned decision systems from going RSI and killing us.
It seems you might be worried the concept “accountability sink” could be use to excuse crimes that should not be excused. I’ll suggest that if improperly applied, they could be used for that, but if properly applied they are more beneficial than this risk.
In your earlier comment you suggested that what is being said here is “Oh the Holocaust is just an accountability sink”. That suggests you may be thinking of this in a True/False way rather than a complex system dynamics kind of way. I don’t think anyone here agrees with “the Holocaust was an accountability sink” but rather, people would agree with “there were accountability sinks in the Holocaust, without which, more people would have more quickly resisted rather than following orders”.
I think you can view punishment at least two ways. (a) As a way of hurting those who deserve to be hurt because they are bad. (b) As a way to signal to other people who would commit such a crime that they should not because the outcome for them will be bad.
I can’t fault fault people who feel a desire for (a), but I feel it should be viewed as perverse, like children who enjoy feeling powerful by playing violent video games, these are not the people who I want as actual soldiers.
I feel (b) is a reasonable part our goal “prevent bad things from happening”. But people are only so influenced by fear of punishment. They may still defect if: - they think they can get away with it, - they are sufficiently desperate, - they believe what they are doing is ideologically right. So if we want to go further with influencing those actors we need to understand those cases, each of which I think includes some form of not thinking what they are doing is bad, and accountability sinks may form a part of that.
You may be concerned that focus on accountability sinks will lead to letting those who should be punished off the hook, but we could flip that. Maybe we punish everyone who has any involvement with an accountability sink because they should have known better. I am currently poorer than I would have been if I had been more willing to engage with the nebulously evil society I was born into. I would feel some vindication if, for example, everyone who bought designer clothes manufactured in sweat shops was charged with a crime. I don’t think this is going to happen for practical reasons, but I think your impression that “sink” implies that we actually won’t hold people accountable is wrong, it is more that people in these situations don’t feel accountable and it’s hard to tell who actually is accountable. I think “everyone is accountable when engaging with an accountability sink” is a reasonable perspective.
Looking at “accountability sinks” is good for predicting when people might engage in mass harmful systems. Predicting this is good since we want to prevent it, and educating people to watch out for accountability sinks because if you willingly engage with an accountability sink you are accountable and will be tried as such.
Note that this does have implications for capitalism / market based society. There are many products that don’t have a 3rd party certificate showing it was audited and isn’t making use of accountability sinks to benefit from criminal things like illegal working conditions or compensation, or improper waste disposal. Buying such products should rightly be illegal. But unfortunately this will raise the cost of legal products forcing many people who are currently near the poverty line below it. This is not something that should be taken lightly either.