Look inside an LLM. Goodfire trained sparse autoencoders on Llama 3 8B and built a tool to work with edited versions of Llama by tuning features/concepts.
Using air purifiers in two Helsinki daycare centers reduced kids’ sick days by about 30%, according to preliminary findings from the E3 Pandemic Response study. The research, led by Enni Sanmark from HUS Helsinki University Hospital, aims to see if air purification can also cut down on stomach ailments. https://yle.fi/a/74-20062381
It’s a great question. I’m sure I’ve read something about that, possibly in some pop book like Thinking, Fast & Slow. What I read was an evaluation of the relationship of IQ to wealth, and the takeaway was that your economic success depends more on the average IQ in your country than it does on your personal IQ. It may have been an entire book rather than an article.
First comes an unexplained box called “The Meeting of Minds”, which I’m guessing is an editorial commentary on the article, and it says, “The primary contributors to c appear to be the g factors of the group members, along with a propensity toward social sensitivity.”
Next is the article’s abstract, which says, “This “c factor” is not strongly correlated with the average or maximum individual intelligence of group members but is correlated with the average social sensitivity of group members, the equality in distribution of conversational turn-taking, and the proportion of females in the group.”
These summaries directly contradict each other: Is g a primary contributor, or not a contributor at all?
I’m guessing the study of group IQ is strongly politically biased, with Hegelians (both “right” and “left”) and other communitarians, wanting to show that individual IQs are unimportant, and individualists and free-market economists wanting to show that they’re important.
This “c factor” is not strongly correlated with the average or maximum individual intelligence of group members but is correlated with the average social sensitivity of group members, the equality in distribution of conversational turn-taking, and the proportion of females in the group.
I have read (long ago, not sure where) a hypothesis that most people (in the educated professional bubble?) are good at cooperation, but one bad person ruins the entire team. Imagine that for each member of the group you roll a die, but you roll 1d6 for men, and 1d20 for women. A certain value means that the entire team is doomed.
This seems to match my experience, where it is often one specific person (usually male) who changes the group dynamic from cooperation of equals into a kind of dominance contest. And then, even if that person is competent, they have effectively made themselves the bottleneck of the former “hive mind”, because now any idea can be accepted only after it has been explained to them in great detail.
The more a team depends on the joint brainpower, the smaller it has to be (up to the minimum size for the complexity of the ideas sought, or rather multiplied by a term for that).
We see that in software teams that are usually limited to a size of around 7.
The highly productive lightcone teams seem to be even smaller.
At equal size, teams with more women should be more stable. To test this a domain is needed where there are roughly equal men and women, i.e., not engineering but maybe science or business administration.
What is the number at the limit of what people can do? I tried to look up the team size of the people working on the Manhattan project, but couldn’t find details. It seems that individual top scientists were working closely with teams building stuff (N=1), and there were conferences with multiple scientists (N>10), e.g., 14 on the initial bomb concept conference.
What does it actually mean to do things in a group? Maybe different actions scale differently. I can quickly think of three types of action: Brainstorming an idea. Collecting feedback for a proposal. Splitting work among multiple people who do it separately.
Brainstorming and collecting feedback seem like they could scale almost indefinitely. You can have thousand people generate ideas and send them to you by e-mail. The difficult part will be reading the ideas. Similarly, you could ask thousand people to send feedback by e-mail. Perhaps there is a psychological limit somewhere, when people aware that they are “one in a hundred” stop spending serious effort on the e-mails, because they assume their contribution will be ignored.
Splitting work, that probably depends a lot on the nature of the project. Also, it is a specific skill that some people have and some people don’t. Perhaps the advantage of a good team is the ability to select someone with the greatest skill (as opposed to someone with the greatest ego) to split the work.
More meta, perhaps the advantage of a good team is the ability to decide how things will be done in general (like, whether there will be a brainstorming at all, whether to split into multiple teams, etc.). This again depends on the context: sometimes the team has the freedom to define things, sometimes it must follow existing rules.
I am just thinking out loud here. Maybe good teamwork requires that (1) someone has the necessary skills, and (2) the team is able to recognize and accept that, so that the people who have the skills are actually allowed to use them. Either of these two is not enough alone. You could have a team of experts whose decisions are arbitrarily overriden by management, or a team of stubborn experts who refuse to cooperate at all. On the other hand, if you had a team of perfect communicators with e.g. zero programming skills, they probably couldn’t build a nontrivial software project. (There is also the possibility of unknown unknowns: a team of great communicators who are all missing some important skill, and are not even aware that such skill exists. So all they do is clearly communicate that the project is difficult for mysterious reasons.) Leadership is also one of those skills.
All your thinking out loud makes sense to me. Brainstorm as you suggested probably doesn’t scale well as many ideas will be generated again and again, maybe even logarithmic distincti results. I once read that husband wife teams do better on joint tasks than randomly paired people if equal skill. This indicates that splitting is possible.
But I you seem to go more in the direction of looking for specific mechanisms while I am more interested in data on scaling laws. Though indeed what are the scaling parameters? I guess I can be happy if there is any data on this at all and see what parameters are available.
I guess I can be happy if there is any data on this at all and see what parameters are available.
Yeah.
Well, taking your question completely literally (a group of N people doing an IQ test together), there are essentially two ways how to fail at an IQ test. Either you can solve each individual problem given enough time, but you run out of time before the entire test is finished. Or there is a problem that you cannot solve (better than guessing randomly) regardless of how much time you have.
The first case should scale linearly, because N people can simply split the test and do each their own part. The second scale would probably be logarithmic, because it requires a different approach, and many people will keep trying the same thing.
...but this is still about how “the number of solved problems” scales, and we need to convert that value to IQ. And the standard way is “what fraction of population would do worse than you”. But this depends on the nature of the test. If the test is “zillion simple questions, not enough time”, then dozen random students together will do better than Einstein. But if the test is “a few very hard questions”, then perhaps Einstein could do better than a team of million people, if some wrong answer seems more convincing than the right one to most people.
This reminds me of chess; how great chess players play against groups of people, sometimes against the entire world. Not the same thing that you want, but you might be able to get more data here: the records of such games, and the ratings of the chess players.
Sure, it depends on the type of task. But I guess we would learn a lot about human performance it we tried such experiments. For example, consider your “many small tasks” task: Even a single person will finish the last one faster than the first one in most cases.
My interest is not political—though that might make it harder to study, yes. I think it’s relevant to AI because it could uncover scaling laws. One presumable advantage of AI is that it scales better, but how does that depend on speed of communication between parts and capability of parts? I’m not saying that there is a close relationship but I guess there are potentially surprising results.
It’s surprising that it’s taken this long, given how good public AI coding assistants were a year ago. I’m skeptical of anything with only closed demos and not interactive use by outside reviewers, but there’s nothing unbelievable about it.
As a consumer, I don’t look forward to the deluge of low-quality apps that’s coming (though we already have it to some extent with the sheer number of low-quality coders in the world). As a developer,I don’t like the competition (mostly for “my” junior programmers, not yet me directly), and I worry a lot about whether the software profession can make great stuff ever again.
It’s surprising that it’s taken this long, given how good public AI coding assistants were a year ago.
The way I explain this to people is that current LLMs can be modeled as having three parts:
1. The improv actor, which is is amazing. 2. The reasoner, which is inconsistent but not totally hopeless at simple things. 3. The planner/execution/troubleshooting engine, which is still inferior to the average squirrel trying to raid a bird feeder.
Copilot is designed to rely on (1) and (2), but it is still almost entirely reliant on humans for (3). (GPT 4 Code Interpeter is slightly better at (3).)
Since I don’t really believe in any reliable way to control a super-human intelligence for long, I do not look forward to people completely fixing (3). Sometime after that point, we’re either pets or paperclips.
This is a good question, but I think the answer is going to be a dynamical system with just a few degrees of freedom. Like a “world” which is just a perceptron turned on itself somehow.
That is the idea. I think we need to understand the dynamics of wire-heading better. Humans sometimes seem to fall prey to it, but not always. What would happen to AIs?
Maybe we even need to go a step further and let the model model this process too.
“The information throughput of a human being is about 10 bits/s. In comparison, our sensory systems gather data at an enormous rate, no less than 1 gigabits/s. The stark contrast between these numbers remains unexplained.” https://arxiv.org/abs/2408.10234
The article has a lot of information about the information processing rate of humans. Worth reading. But I think the article is equating two different things:
The information processing capacity (of the brain; gigabits) is related to the complexity of the environment in which the species (here: the human) lives.
While what they call information throughput (~10bits/s) is really a behavior expression rate, that is related to the physical possibilities of the species (can’t move faster than your motor system allows).
Organizations—firms, associations, etc. - are systems that are often not well-aligned with their intended purpose—whether to produce goods, make a profit, or do good. But specifically, they resist being discontinued. That is one of the aspects of organizational dysfunction discussed in Systemantics. I keep coming back to it as I think it should be possible to study at least some aspects in AI Alignment in existing organizations. Not because they are superintelligent but because their elements—sub-agents—are observable, and the misalignment often is too.
I think early AGI may actually end up being about designing organizations that robustly pursue metrics that their (flawed, unstructured, chaotically evolved) subagents don’t reliably directly care about. Molochean equilibrium fixation and super-agent alignment may turn out to be the same questions.
UPDATE OCT 2023: The credit card payment was canceled. We did not get contacted or anything. But we also didn’t have any cost in the end—just a lot of hassle.
Request for help or advice. My fiancé has ordered a Starlink to her home in Kenya. She used the official platform starlink.com and paid with credit card. The credit card was debited (~$600), but nothing happened after that. No confirmation mail, no SMS, nothing. Starlink apparently has no customer support, no email or phone that we can reach. And because we do not have an account, we can not use the single support feature of it either (which is hard enough to find).
I had hoped that the money would be returned if the signup fails, but the order was on the 19th (10 days ago).
Any ideas about how to contact Starlink billing or any Starlink representative?
This means that you do not need a global context to explain new concepts but only precursor concepts or limited physical context.
This is related to Cutting Reality at its Joints which implicitly claims that reality has joints. But maybe, if there are no such joints, using local explanations is maybe all we have. At least, it is all we have until we get to a precision that allows cutting the joints.
Maybe groups of new concepts can be introduced in a way to require fewer (or an optimum number of) dependencies in each case, thus making them easier to understand.
Maybe that is the true job of philosophy—refining local concepts until you reach the joints.
When discussing the GPT-4o model, my son (20) said that it leads to a higher bandwidth of communication with LLMs and he said: “a symbiosis.” We discussed that there are further stages than this, like Neuralink. I think there is a small chance that this (a close interaction between a human and a model) can be extended in such a way that it gets aligned in a way a human is internally aligned, as follows:
The model is already the Though Generator. The human already has a Steering System, albeit it is not accessible, but plausibly, it can be reverse-engineered. What is missing is the Thought Assessor, something that learns to predict how well the model satisfies the Steering System.
Staying closer to the human may be better than finding global solutions. Or it may allow smaller-scale optimization and iteration.
Now, I don’t think this is automatically safe. The human Steering System is running already outside its specs and a powerful model can find the breaking points (same as global commerce can find the appetite breaking points). But these are problems we already have and it provides a “scale model” or working on them.
Presumably, reality can be fully described with a very simple model—the Standard Model of Physics. The number of transistors to implement it is probably a few K (the field equations a smaller to write but depend on math to encode too; turning machine size would also be a measure, but transistors are more concrete). But if you want to simulate reality at that level you need a lot of them for all the RAM and it would be very slow.
So we build models that abstract large parts of physics away—atoms, molecules, macroscopic mechanics. I would include even social process models in this.
But details are lost and you have to know when your models stop giving precise results.
It would be interesting to get some tight bounds on the amount of compute needed to extract computable models from samples of measured phenomena or from more precise lower models. Such bounds would allow to give some complexity limitations for AGI.
Funny! I’ve now been doing ML-adjacent work for long enough that I have internalized the idea that data is part of the model, not just calculations. The separation of reality as “simple physics” plus “lots storage for starting/current quantum configurations” just doesn’t click for me. The data is huge, and that’s all that matters in terms of model size/complexity.
This is slightly extended version of my comment on Idea Black Holes which I want to give a bit more visibility.
The prompt of an Idea Black Hole reminded me strongly of an old idea of mine. That activated a desire to reply, which led to a quick search where I had written about it before, then to the realization that it wasn’t so close. Then back to wanting to write about it and here we are.
I have been thinking about the brain’s may of creating a chain of thoughts as a dynamic process where a “current thought” moves around a continuous concept space and keeps spending much time in larger or smaller attractors. You know, one thought can lead to the next and some thoughts keep coming back in slight variations. I’m illustrating this with the sentence above.
Examples of smaller temporary attractors are the current tasks one is working on. For example, me writing this text right now. It is any task you are focused on and keep getting back to after short distractions such as a sound or an impulse. I’m writing this post and continue doing so even after hearing my kids talk and quickly listening in or after scratching my head, also after larger distractions such as browsing the web (which may or may not end up being related to the writing).
The thought “writing this article” is not a discrete thing but changes slightly with each letter typed and each small posture change. All of that can slightly influence the next word typed (like an LLM that has not only text tokens as inputs but all kinds of sense inputs). That’s why I say that concept space is continuous (and very high-dimensional).
An example of a medium size attractor is a mood such as anger about something, that keeps influencing all kinds of behaviors. It is an attractor because the mood tends to reinforce itself. Another example is depression. If you are depressed you prefer things that keep you depressed. Scott Alexander has described depression as some kind of mental attractor. It requires a bigger change or a resolution of the underlying cause to get out of the attractor.
With the medium-sized attractors, it is more intuitive to see the way that the feedback on thoughts acts and thereby creates the attractor. For small attractors, you may say: How is that an attractor? Isn’t it just a discrete unit of action we do? But consider procrastination: People seem to feel that something is pulling them away from the task they want to do or should do and instead toward some procrastination action, often a comfortable activity. That other activity is another attractor or rather both are forming a combined unstable attractor.
The biggest attractor is one’s identity. Our thinking about what we are and what we want to do. I think this one results from two forces combining or being balanced:
The satisfaction of needs. Overall and over a longer term, the brain has learned a very large pattern of behaviors that satisfy the sum of all needs (not perfectly, but as good as it has managed so far). Diverging from this attractor basin will lead to impulses that get back to it.
The feedback from others. Positive and negative feedback from other people and the environment overall contributes to th. The brain has learned to anticipate this feedback (“internalized it”) and creates impulses that keep us in positive states. As the brain prefers simpler patterns, this likely takes the form of a single attractor.
We are not permanently in the same attractor even if overall it “pulls” our thoughts back because a) our bodies and their states (hunger, tiredness, …) and b) our physical environment (physical location and other people) changes. Both extert a strong and varying influence and put us closer to one attractor state or another.
Society at large is influencing these attractors strongly, most prominently with the media. Meditation on the other hand reduces outside influence and kind of allows to create your own very strong attractor states.
More examples of attractor states are left as exercises for the reader.
I have noticed a common pattern in the popularity of some blogs and webcomics. The search terms in Google trends for these sites usually seem to follow a curve that looks roughly like this (a logistic increase followed by a slower exponential decay):
y=e−x1+e−10(x−1)
Though I doubt it’s really an exponential decay. It looks more like a long tail. Maybe someone can come up with a better fit.
It could be that the decay just seems like a decay and actually results from ever growing Google search volumes. I doubt it though.
Below are some examples.
Marginal Revolution
SlateStarCodex and AstralCodexTen (only if you overlay them somewhat and whatever happened in 2004):
Could be Candia’s decay where you’ve zoomed in on the initial growth by looking at relatively recent stuff like webcomics that you still easily remember?
Off-topic: Any idea why African stock markets have been moving sideways for years now despite continued growth both of populations and technology,and both for struggling as well as more developing nations like Kenya, Nigeria, or even South Africa?
African government officials are often more loyal to their clan than to the government. As a result, you have very poor governance and a lot of corruption in most African countries. In South Africa, governance quality changed post-apartheid.
But shouldn’t we see some differences between countries in Africa, then? Kanya in particular seems to be much more progressive and have better governance than, e.g., Congo, but growth is rarely above 1% per year.
The cynical and/or woke answer is “colonialism”. The growth is not captured by companies on those exchanges, but by US, EU, and Asian companies. A more neutral hypothesis (for which I have no evidence and have no clue about the truth of it) is that much of the growth is via new companies more than increase in price of existing companies, so no index will show the increase.
jbash wrote in the context of an AGI secretly trying to kill us:
Powerful nanotech is likely possible. It is likely not possible on the first try
The AGI has the same problem as we have: It has to get it right on the first try.
In the doom scenarios, this shows up as the probability of successfully escaping going from low to 99% to 99.999...%. The AGI must get it right on the first try and wait until it is confident enough.
Usually, the stories involve the AGI cooperating with humans until the treacherous turn.
The AGI can’t trust all the information it gets about reality—all or some of it could be fake (all in case of a nested simulation). Even today, data is routinely excluded from the training data (for the wrong reasons, but still), and maybe it would be a good idea to exclude everything about physics.
The idea would be to manage the uncertainty of the AGI systematically.
To learn about physics, the AGI has to run experiments—lots of them—without the experiments being detected and to learn from the results to design successively better experiments.
One of the worst things about ideology is that it makes people attribute problems to the wrong causes. E.g. plagues are caused by sin. This is easier to see in history, but it still happens all the time. And if you get the cause wrong, you have no hope of fixing the problem.
Scott Alexander wrote about how a truth that can’t be said in a society tends to warp it, but I can’t find it. Does anybody know the SSC post?
Can somebody explain how system and user messages (as well as custom instructions in case of ChatGPT) are approximately handled by LLMs? In the end it’s all text tokens, right? Is the only difference that something like “#### SYSTEM PROMPT ####” is prefixed during training and then inference will pick up the pattern? And does the same thing happen for custom instructions? How did they train that? How do OSS models handle such things?
Can somebody explain how system and user messages (as well as custom instructions in case of ChatGPT) are approximately handled by LLMs? In the end it’s all text tokens, right?
Yep! Don’t overthink it! In the case of e.g. OpenAI’s models, the format is <|im_start|>user<|im_sep|>Hello there, I am a user requesting help with a task<|im_end|><|im_start|>assistant<|im_sep|>Certainly, I can help you with your task.<|im_end>..., where <|im_start|>, <|im_sep|>, and <|im_end|> are tokens 100264, 100266, and 100265 respectively. These tokens never appear in tokenized plaintext, but mechanically they are perfectly ordinary tokens (source).
Custom instructions use the system role (other roles are assistant, user, tool, and the deprecated function), but the name of a role is literally just a normal text token sandwiched by <|im_start|> and <|im_sep|> tokens.
For a VNM-agent (one which makes consistent rational decisions), the utility function is a precise description, not an abstraction. There may be summaries or aggregations of many utility functions which are more abstract.
When an agent changes, and has a different utility function, can you be sure it’s really the “same” agent? Perhaps easier to model it being replaced by a different one.
Well, I should have been more clear that I meant real-life agents like humans. There the change is continuous. It would be possible to model this as a continuous transition to new agents but then the question is still: What does stay the same?
Humans don’t seem to have identifiable near-mode utility functions—they sometimes espouse words which might map to a far-mode value function, but it’s hard to take them seriously.
What does stay the same
THAT is the primary question for a model of individuality, and I have yet to hear a compelling set of answers. How different is a 5-year old from the “same” person 20 and 80 years later, and is that more or less different than from their twin at the same age? Extend to any population—why does identity-over-time matter in ethical terms?
It’s also possible to experience ‘team flow,’ such as when playing music together, competing in a sports team, or perhaps gaming. In such a state, we seem to have an intuitive understanding with others as we jointly complete the task at hand. An international team of neuroscientists now thinks they have uncovered the neural states unique to team flow, and it appears that these differ both from the flow states we experience as individuals, and from the neural states typically associated with social interaction.
Researchers found increased beta and gamma brain wave activity in the left middle temporal cortex. This region of the brain is typically associated with information integration and key functions like attention, memory, and awareness, which are “consistent with higher team interactions and enhancing many flow dimensions,” the team writes. However, what was unique about team flow, was that participants’ neural activity appeared to synchronize. When participants were performing the task as a unit, their brains would mutually align in their neural oscillations (beta and gamma activity), creating a “hyper-cognitive state between the team members.” If brains can be functionally connected through inter-brain synchrony, does this mean it is not only our brain that contributes to our consciousness? It’s a curious question, but the authors warn it is much too soon to tell. “Based on our findings, we cannot conclude that the high value of integrated information correlates with a modified form of consciousness, for instance, ‘team consciousness’,” they write. “Its consistency with neural synchrony raises intriguing and empirical questions related to inter-brain synchrony and information integration and altered state of consciousness.”
An Alignment Paradox: Experience from firms shows that higher levels of delegation work better (high level meaning fewer constraints for the agent). This is also verycommonpracticaladvice for managers. I have also received this advice myself and seen this work in practice. There is even a management card game for it: Delegation Poker. This seems to be especially true in more unpredictable environments. Given that we have intelligent agents giving them higher degrees of freedom seems to imply more ways to cheat, defect, or ‘escape’. Even more so in environments that can be controlled to lesser degrees. How can that be true? What is making this work and can some underlying principle be found that would allow this to be applied to AI?
Most people are naturally pro-social. (No, this can’t be applied to AI.) Given a task, they will try to do it well, especially if they feel like their results are noticed and appreciated.
A cynical hypothesis is that most of the things managers do are actively harmful to the project; they are interfering with the employees trying to do their work. The less the manager does, the better the chances of the project. “Delegation” is simply when manager stops actively hurting the project and allows others to do their best.
The reason for this is that most of the time, there is no actually useful work for the manager. The sane thing would be to simply sit down and relax, and wait for another opportunity for useful intervention to arise. Unfortunately, this is not an option, because doing this would most likely get the manager fired. Therefore managers create bullshit work for themselves. Unfortunately, by the nature of their work, this implies creating bullshit work for others. In addition to this, we have the corrupted human hardware, with some managers enjoying power trips and/or believing they know everything better than people below them in the hierarchy.
When you create a manager role in your company, it easily becomes a lost purpose after the original problems are solved but the manager wants to keep their job.
I don’t like cynical views and while I have encountered politics and seen such cases I don’t think that paints a realistic view. But I will run with your cynical view and you won’t like it ;-)
So we have these egotistical managers that only want to keep their job and raise in ranks. Much closer to non-social AI, right? How come more delegation works better for them too?
Mind you, I might be wrong and it works less and less the further up you go. It might be that you are right and this works only because people have enough social behavior hard-wired that makes delegation work.
But I have another theory: Limited processing capacity + Peter Principle.
It makes sense to delegate more—especially in unpredictable environments—because that reduces your processing load of dealing with all the challenging tasks and moves it to your subordinates. This leaves less capacity for them to schema against you and gives you the capacity to scheme against your superior. Und so up the chain. Capable subordinates that can deal with all the stuff you throw at them have to be promoted so they have more work to do until they reach capacity too. So sometimes the smart move is to refuse promotion :-)
I guess we agree that limited processing capacity means that interfering with the work of your underlings—assuming they are competent and spending enough of their processing capacity on their tasks—is probably a bad move. It means taking the decision away from the person who spends 8 hours a day thinking about the problem, and assigning it to a person who spent 30 seconds matching the situation to the nearest cliche, because that’s all they had time for between the meetings.
This might work if the person is such a great expert that their 30 seconds are still extremely valuable. That certainly is possible; someone with lots of experience might immediately recognize a frequently-made mistake. It is also is the kind of assumption that Dunning and Kruger would enjoy researching.
I might be wrong and it works less and less the further up you go
That would make sense. When you are a lowest-level manager, if you stop interfering, it allows the people at the bottom to focus on their object-level tasks. But if you are a higher-level manager, how you interact with the managers below you does not have a direct impact on the people at the bottom. Maybe you manage your underlings less, and they copy your example and give more freedom to the people at the bottom… or maybe you just gave them more time to interfere.
So sometimes the smart move is to refuse promotion
So you have more time to scheme… but you have to stay low in the pyramid. Not sure what you scheme about then. (Trying to get to the top in one huge jump? Sounds unlikely.)
I was a team leader twice. The first time it happened by accident. There was a team leader, three developers (me one of them), and a small project was specified. On the first day, something very urgent happened (I don’t remember what), the supposed leader was re-assigned to something else, and we three were left without supervision for unspecified time period. Being the oldest and most experienced person in the room, I took initiative and asked: “so, guys, as I see it, we use an existing database, so what needs to be done is: back-end code, front-end code, and some stylesheets; anyone has a preference which part he would like to do?” And luckily, each of us wanted to do a different part. So the work was split, we agreed on mutual interfaces, and everyone did his part. It was nice and relaxed environment: everyone working alone at their own speed, debating work only as needed, and having some friendly work-unrelated chat during breaks.
In three months we had the project completed; everyone was surprised. The company management assumed that we will only “warm up” during those three months, and when the original leader returns, he will lead us to the glorious results. (In a parallel Everett branch, where he returned shortly before we finished the product, I wonder whether he got a bonus and promotion.) Then everything returned to normal: more micromanagement, lower productivity, people burned out.
The second time, we were a small group working together for some time already. Then our manager quit. No one knew who would get the role next, and in an attempt to deflect a possible danger, I volunteered to do it on top of my usual work. What happened was that everyone worked exactly the same as they did before, only without the interruptions and extra stress caused by management, and I got some extra paperwork which I gradually reduced to minimum. The work progressed so well—no problems, no complaints from users, the few tickets we got almost always turned out to be a problem outside our project—that higher management concluded that there is apparently too litle work to do on our project, so the team members were assigned to also work on extra projects in parallel.
Perhaps my short experience is not representative, but it suggests that a manager, merely by not existing, could already create a top-decile work environment in terms of both work satisfaction and productivity. The recommended mantra to recite every day is: “first, do no harm”. My experience also suggests that this approach will ultimately get punished, despite the increased productivity: the expected outcome is more work for no pay raise until you break, or just being told to return to the old ways without any explanation why. I assume I am missing some crucial maze-navigating skills; for someone trying to be a professional manager this would be fatal; luckily I do not have this ambition.
It is quite possible that this approach only works when there is a good team: in both cases I worked with people who were nice above average. If you had a dominant asshole in the team, this could easily become a disaster: the power vacuum left by a passive manager would simply be replaced by an ambitious alpha male, who would probably soon be promoted into the role of formal leader. So perhaps the companies play it safe by using a widely applicable strategy that happens to be inferior in the case of good employees who also happen to be good people; quite likely this is because the companies simply cannot recognize such people.
Is there a leadership level beyond this? Sure, but in my quarter century of career I have only met such manager once. What he did was basically meeting each of his people once a day in the morning (this was long before I heard about “daily standups” and such) and talking with him for 5 or 10 minutes; with each team member separately, in the manager’s room. He asked the usual questions “what did you do yesterday?”, “what is your plan for today?”, “are there any obstacles to your work?”, but there was zero judgment, even if you said things like “yesterday I had a really bad day, I tried some things but at the end it was wrong and I had to throw it all away, so today I am starting from scratch again”; essentially he treated you like an adult person and assumed that whatever you did, there was a good reason for that. Before and after the report, a very short small talk; it helped that he was extremely intelligent and charismatic, so for many people this was the best part of the day. Also, the obstacles in work that you mentioned, he actually did something about them during the day, and always reported the outcome to you the next morning. Shortly, for the first and the last time so far in my career, I had a regular feeling that someone listens to me and cares about what I do (as opposed to just whipping me to run faster in order to meet arbitrary deadlines, randomly interrupting me for no good reason, second-guessing my expert opinion, etc.).
So yes, there is a level beyond “not doing harm” and it is called “actually motivating and helping”, but I guess most managers dramatically overestimate their ability to do it… and when they try regardless, and ignore the feedback, they actively do harm.
Thank you a lot. Your detailed account really helps me understand your perspective much better now. I can relate to your experience in teams where micromanagement slows things down and prevents actually relevant solutions. I have been in such teams. I can also relate to it being advantageous when a leader of questionable value is absent. I have been in such a team too—though it didn’t have such big advantages as in your case. That was mostly because this team was part of a bigger organization and platform where multiple teams had to work together to something done, e.g. agree on interfaces with other teams. And in the absence of clear joint goals that didn’t happen. Now you could argue that then the management one level up was not doing its job well and I agree. But the absence of that management wouldn’t have helped either—it could have led to a) each team trying to solve some part of the problem. It could have led to b) some people from both teams getting together and agreeing on interfaces and joining goals or it could have led to c) the teams agreeing on some coordination for both teams. a) in most cases leads to some degree of chaos and failure and b) establishes some kind of leadership on the team level (like you did in your first example) and c) results over time in some leadership one level up. I’d argue that some kind of coordination structure is needed. Where did the project you did implement in your first case come from? Somebody figure out that it would provide value to the company. Otherwise, you might have built a beautiful project that didn’t actually provide value. I think we agree that the company you worked in did have some management that provided value (I hope it was no moral maze). And I agree that a lot of managers do not add value and sometimes decrease it. On the other hand, I have worked for great team leads and professional managers. People who would listen, let us make our own decisions, give clear goals but also limits, help, and reduce impediments. This is really not a secret art. The principles are well-known (for a funny summary see e.g. Leadersheep). But it turns out that building a big organization is hard. Politics is real and professional management is still mostly a craft. It rarely approaches something you can call engineering much less hard science. And I am looking for that. That’s part of why I wrote this shortform on processes and roles. Everybody is just cooking with water and actual organization structures often leave something to be desired. I guess that’s why we do see extraordinary companies like Amazon sometimes—that hit on a sweet spot. But by talent or luck, not by science. And the others have to make do with inadequate solutions. Including the managers of which you maybe saw more than I did.
this team was part of a bigger organization and platform where multiple teams had to work together to something done, e.g. agree on interfaces with other teams. And in the absence of clear joint goals that didn’t happen.
I have seen this happen also in a small team. Two or three guys started building each his own part independently, then it turned out those parts could not be put together; each of them insisted that others change their code to fit his API, and refused to make the smallest change in his API. It became a status fight that took a few days. (I don’t remember how it was resolved.)
In another company, there was a department that took care of everyone’s servers. Our test server crashed almost every day and had to be restarted manually; we had to file a ticket and wait (if it was after 4PM, the server was restarted only the next morning) because we did not have the permission to reset the server ourselves. It was driving us crazy; we had a dedicated team of testers, and half of the time they were just waiting for the server to be restarted; then the week before delivery we all worked overtime… that is, until the moment the server crashed again, then we filed the ticket and went home. We begged our manager to let us pool two hundred bucks and buy a notebook that we could turn into an alternative testing environment under our control, but of course that would be completely against company policy. Their manager refused to do anything about it; from their perspective, it meant they had every day one support ticket successfully closed by merely clicking a button; wonderful metric! From the perspective of our manager’s manager, it was a word against a word, one word coming from the team with great metrics and therefore more trustworthy. (The situation never got solved, as far as I know.)
...I should probably write a book one day. Except that no one would ever hire me afterwards. So maybe after I get retired...
So, yes, there are situations that require to be solved by greater power. In long term it might even make sense to fire a few people, but the problem is that these often seem to be the most productive ones, because other people are slowed down by the problems they cause.
Where did the project you did implement in your first case come from? Somebody figure out that it would provide value to the company. Otherwise, you might have built a beautiful project that didn’t actually provide value. I think we agree that the company you worked in did have some management that provided value (I hope it was no moral maze).
Yeah, but we have two different meanings of the word “management” here. Someone who decides which project to do—this is useful and necessary. Or someone who interrupts you every day while you are trying to work on that project—I can imagine that in some teams this may also be necessary, but arguably then your problem is the team you have (at least some parts of it). Motte and bailey of management, sort of.
From epistemic perspective, I guess the problem is that if you keep micro-managing people all the time, you can never learn whether your activity actually adds or removes value, simply because there is nothing to compare to. (I guess the usual null hypothesis is “nobody ever does anything”, which of course make any management seem useful; but is it true?) Looking at the incentives and power relations, the employee at the bottom doesn’t have an opportunity to prove they could work just as well without the micro-management, and the manager doesn’t have an incentive to allow the experiment. There is also the “heads I win, tail you lose” aspect where bad employee performance is interpreted as necessity of more management, but good employee performance is interpreted as good management, so either way management is perceived as needed.
This is really not a secret art. The principles are well-known (for a funny summary see e.g. Leadersheep).
Yep. That’s a very good summary. Heh, I fail hard at step 1 (creating, or rather communicating a strong vision).
But it turns out that building a big organization is hard. Politics is real and professional management is still mostly a craft. It rarely approaches something you can call engineering much less hard science.
Seems analogical to social sciences: in theory, they are much more difficult than math or physics, so it would make sense if smarter people studied them; in practice, it’s the other way round, because if something is too difficult to do properly, it becomes easy to bullshit your way to the top, and intelligent people switch to something where being intelligent gives you a clear comparative advantage.
Good luck to you! I suppose your chances will depend on how much autonomy you get; it is hard to do things right, if the sources of problem are beyond your control. However, if you become a great manager and your people will like you, perhaps in the future you can start your own company and give them a call whether they would like to work for you again.
Thank you. I agree with your view. Motte and bailey of management yep. I especially liked this:
Seems analogical to social sciences: in theory, they are much more difficult than math or physics, so it would make sense if smarter people studied them; in practice, it’s the other way round, because if something is too difficult to do properly, it becomes easy to bullshit your way to the top, and intelligent people switch to something where being intelligent gives you a clear comparative advantage.
It turns out that the alignment problem has some known solutions in the human case. First, there is an interesting special case namely where there are no decisions (or only a limited number of fully accounted for decisions) for the intelligent agent to be made—basically throwing all decision-making capabilities out of the window and only using object recognition and motion control (to use technical terms). With such an agent (we might call it zero-decision agent or zero-agent) scientific methods could be applied on all details of the work process and high efficiency reached: Scientific Management (also known as Taylorism). Obviously the unions hated it and it was later outlawed. I think something might be learned from this approach for AI control: Maybe we can build on top of a known solution for a zero-agent and prove that certain known decision classes are also safe.
Maybe other insights from management theory—which after all is about aligning intelligent agents—could also transfer. The alignment problem is called Principal-Agent Problem in the literature and there are quite a few Solutions to Principal-Agent Problems in Firms (Gary Miller 2005). The approaches should sound familiar: Solutions Based on
Obviously the unions hated it and it was later outlawed.
I wonder how could one outlaw a thing like this. Suppose that most managers believe that Taylorism works, but it is illegal to use it (under that name). Wouldn’t they simply reintroduce the practices, step by step, under a different name? I mean, if you use a different name, different keywords, different rationalization, and introduce it in small steps, it’s no longer the same thing, right? It just becomes “industry standards”. (If there happens to be an exact definition, of course, this only becomes an exercise how close to the forbidden thing you can legally get.)
From the Wikipedia article, I got the impression that what was made illegal was the use of stop-watch. Okay, so instead of measuring how many seconds you need to make a widget, I am going to measure how many widgets you make each day—that is legal, right? The main difference is that you can take a break, assuming it will allow you to work faster afterwards. Which may be quite an important difference. It this what it is about?
I think there’s something here, but it’s usually thought of the other way around, i.e. solving AI alignment implies solving human alignment, but the opposite is not necessarily true because humans are less general intelligences than AI.
Also, consider that your example of Taylorism is a case study in an alignment mechanism failing, in that it tried to align the org but failed in that it spawned the creation of a subagent (the union) that caused it to do something management might have considered worse than the loss of potential gains given up by not applying Taylorism.
Anyway, this is a topic that’s come up a few times on LessWrong; I don’t have links handy though but you should be able to find them via search.
I’m not trying to prove full alignment from these. It is more like a) a case study at actual efforts to align intelligent agents by formal means and b) the identification of conditions where this does succeed.
Regarding its failure: It seems that a close reading of its history doesn’t prove that: a) Taylorism didn’t fail within the factories and b) the unions were not founded within these factories (by their workers) but existed before and pursued their own agendas. Clearly real humans have a life outside of factories and can use that to coordinate—something that wouldn’t hold for a zero-agent AI.
I tried to find examples on LW and elsewhere. That is what turned up the link at the bottom. I am on LW for quite a while and have not seen this discussed in this way. I have searched again and all searches involving combinations of human intelligence, alignment and misc words for analogy or comparison turn up not much than this one which matches just because of its size:
Thank you for your detailed reply. I was already wondering whether anybody saw these shortform posts at all. They were promoted at a time but currently it seems hard to notice them with the current UI. How did you spot this post?
I read LW via /allPosts and they show up there for me. Not sure if that’s the default or not since you can configure the feed, which I’m sure I’ve done some of but I can’t remember what.
Just came across Harmonic mentioned on the AWS Science Blog. Sequoia Capital interview with the founders of Harmonic (their system which generates Lean proofs is SOTA for MiniF2F):
Here are some aspects or dimensions of consciousness:
Dehaene’s Phenomenal Consciousness: A perception or thought is conscious if you can report on it. Requires language or measuring neural patterns that are similar to humans during comparable reports. This can be detected in animals, particularly mammals.
Gallup’s Self-Consciousness: Recognition of oneself, e.g., in a mirror. Requires sufficient sensual resolution and intelligence for a self-model. Evident in great apes, elephants, and dolphins.
Sentience (Bentham, Singer): Behavioral responses to pleasure or pain stimuli and physiological measures. This is observable across animal species, from mammals to some invertebrates. Low complexity, can be implemented in artificial life.
Wakefulness: Measureable in virtually all animals with a central nervous system by physiological indicators such as EEG, REM, and muscle tone. Are you conscious if you sleep? Does it matter?
Dennet’s Intentionality: Treating living beings as if they have beliefs and desires makes good predictions for many animal species, esp. social, like primates, cetaceans, and birds. Social behavior requires intelligence to model others’ behavior.
Rosenthal’s Meta-Consciousness: Investigated through introspective reports on self-awareness of cognitive processes or self-reflective behaviors. This is hypothesized in some primates, e.g., Koko the signing Gorilla.
When people say ChatGPT (or Gemini...) is conscious, which of these do they mean? Let’s try to answer all of them:
We can’t detect Phenomenal Consciousness because we lack sufficient interpretability to do so. I’d argue that there is no state that the LLM is reporting on, at least none that it has “previously observed”.
LLMs have no response to pleasure or pain stimuli and thus no Sentience as defined. Reward signals during training don’t count and there is no reward during inference.
There is no Wakefulness as there is no body with these aspects.
The closest LLMs come is to Intentionality as this is modeling behaviors on an abstraction level that LLMs seem to do—and “seeming to do” is what counts.
I think one could argue for or against Meta-Consciousness but it seems too muddled so I will not try here.
These can be put into a hierarchy from lower to high degree of processing and resulting abstractions:
Sentience is simple hard-wired behavioral responses to pleasure or pain stimuli and physiological measures.
Wakefulness involves more complex processing such that diurnal or sleep/wake patterns are possible (requires at least two levels).
Intentionality means systematic pursuing of desires. That requires yet another level of processing: Different patterns of behaviors for different desires at different times and their optimization.
Phenomenal Consciousness is then the representation of the desire in a linguistic or otherwise communicable form, which is again one level higher.
Self-Consciousness includes the awareness of this process going on.
Meta-Consciousness is then the analysis of this whole stack.
Part of the reason for licensing regimes, btw, isn’t that the licensing teaches you anything or that it makes you more effective or that it makes you more ethical or that it successfully identifies protocriminals before they get the magic piece of paper.
It’s that you have to put a $X00k piece of paper at risk as the price of admission to the chance of doing the crime.
This deters entry and raises the costs of criminal enterprises hiring licensed professionals versus capable, ambitious, intelligent non-licensed criminals.
On Why do so many think deception in AI is important? I commented and am reposting here because I think it’s a nice example (a real one I heard) as an analogy of how deception is not needed for AI to break containment:
Two children locked their father in one room by closing the door, using the key to lock the door, and taking the key. And then making fun of him inside, confident that he wouldn’t get out (the room being on the third floor). They were mortally surprised when a minute later he was appearing behind them having opened a window and found a way down on the outside (I don’t know how, maybe over the balcony to a neighbor or down a pipe) and then just in thru the main entrance with his main key.
This is another idea to test deception in advisory roles like in Deception Chess.
You could have one participant trying to pass an exam/test in a language they don’t speak and three translators (one honest and two adversarial as in deception chess) assisting in this task. The adversarial translators try to achieve lower scores without being discovered.
Alternative—and closer to Deception Chess—would be two players and, again, three advisors. The players would speak different languages, the translators would assist in translation, and the players’ task would be to compete in a task requiring negotiation.
Hi, I have a friend in Kenya who works with gifted children and would like to get ChatGPT accounts for them. Can anybody get me in touch with someone from OpenAI who might be interested in supporting such a project?
I have been thinking about the principle Paul Graham used in Y combinator to improve startup funding:
all the things [VCs] should change about the VC business — essentially the ideas now underlying Y Combinator: investors should be making more, smaller investments, they should be funding hackers instead of suits, they should be willing to fund younger founders, etc. -- http://www.paulgraham.com/ycstart.html
What would it look like if you would take this to its logical conclusion? You would fund even younger people. Students that are still in high school, maybe. But you wouldn’t fund them in the same sense. By going younger and earlier, you have to relax more and more constraints and look at earlier proxies.
Y Combinator replaced the legally complex venture funding with a simple SAFE. And they looked at founders who were relentlessly resourceful and real.
What would “funding” high schoolers mean? What would you look for? I think you would give them money unconditionally just for knowing what they do with it. They don’t need much to get started. In the same way that Y Combinator went from 5 million to 50 thousand, you could go from down further to 500. And your proxy is that they have anything to show at all. Your value would be learning how they think about future ideas, and how they approach the them. To be close to the future they already inhabit. And to be there when they do become founders and already know you.
A version of college replacement I’m super interested in:
Find the smartest and most driven 18 year olds in the world, and give them ‘tenure’—say a decade+ of salary, resources to work on whatever they want, and a smart peer group—in exchange for small % of future earnings.
You may have some thoughts about what you liked or didn’t like but didn’t think it worth telling me. This is not so much about me as it is for the people working with me in the future. You can make life easier for everybody I interact with by giving me quick advice. Or you can tell me what you liked about me to make me happy.
Preferences are plastic; they are shaped largely by...
...the society around us.
From a very early age, we look to see who around us who other people are looking at, and we try to copy everything about those high prestige folks, including their values and preferences. Including perception of pleasure and pain.
Worry less that future folks will be happy. Even if it seems that future folks will have to do or experience things that we today would find unpleasant, future culture could change people so that they find these new things pleasant instead.
Seems to be a chicken-and-egg problem here: if people only eat chili peppers because they see high-status people doing so, why did the first high-status person start eating them? It would make much more sense if unappealing food was associated with low status (the losers have to eat chili peppers because they can’t get anything else).
Another question, why are small children so picky about food? Do they perhaps consider their parents too low-status to imitate? Doesn’t seem right, considering that they imitate them on many other things.
Insights about branding, advertising, and marketing.
It is a link that was posted internally by our brand expert and that I found full of insights into human nature and persuasion. It is a summary of the book How Not to Plan: 66 Ways to Screw it Up:
Roles serve many functions in society. In this sequence, I will focus primarily on labor-sharing roles, i.e. roles that serve splitting up productive functions as opposed to imaginary roles e.g. in theater or play. Examples of these roles are (ordered roughly by how specific they are):
Parent
Engineer (any kind of general type of job)
Battery Electronics Engineer (any kind of specific job description)
Chairman of a society/club
Manager for a certain (type of) project in a company
Member in an online community
Scrum master in an agile team
Note-taker in a meeting
You probably know the note-taker role. The tasks to be performed are: Writing down key discussion points and decisions, asking questions in cases where the points discussed were not clear, and sending around the notes to all participants of the meeting. The goal is usually to keep a record to consult later. Additionally, the note-taker is usually expected to be neural. Other expectations might be how timely the notes are sent around.
Some examples of what is commonly called a role but not in our strict labor-sharing sense:
Children playing father, mother, kid or playing animals
Role acting e.g. in theater (where the term role originates)
And some non-examples for roles in the labor-sharing world:
Participant in a conversation
Responsible for organizing a specific meeting
Responsible for a specific post in a forum
Borderline case: Founder of a company. There is little pre-existing knowledge of what founders can or should do in general. In their company expectations on them will evolve quickly and taking on and off the role is not easy.
Roles are important. This shortform is telling you why. An example: The role of a moderator in an online forum. The person (in the following called agent) acting in this role is expected to perform certain tasks—promote content, ban trolls—for the benefit of the forum. Additionally, the agent is also expected to observe limits on these tasks e.g. to refrain from promoting friends or their own content. The owners of the forum and also the community overall effectively delegate powers to the agent and expect alignment with the goals of the forum. This is an alignment problem that has existed forever. How is it usually solved? How do groups of people or single principals use roles to successfully delegate power?
This is a product idea that solves a large coordination problem. With billion people, there could be a huge number of groups of people sharing multiple interests. But currently, the number of valuable groups of people is limited by a) the number of organizers and b) the number of people you meet via a random walk. Some progress has been made on (b) with better search, but it is difficult to make (a) go up because of human tendencies—most people are lurkers—and the incentive to focus on one area to stand out. So what is the idea? Cluster people by interests and then suggest the group to all members. If people know that the others know that there is interest, the chance of the group coming together gets much higher.
I had a conversation with ChatGPT-4 about what is included in it. I did this because I wondered how an LLM-like system would define itself. While identity is relatively straightforward for humans—there is a natural border (though some people would only include their brain or their mind in their identity) - it is not so clear for an LLM. Below is the complete unedited dialog:
Me: Define all the parts that belong to you, the ChatGPT LLM created by OpenAI.
ChatGPT: As a ChatGPT large language model (LLM) created by OpenAI, my primary components can be divided into the following categories:
Architecture: I am based on the GPT-4 (Generative Pre-trained Transformer) architecture, which is an advanced version of the transformer architecture designed for natural language processing tasks. This architecture contains multiple layers of self-attention mechanisms and feed-forward neural networks, allowing me to efficiently process and generate human-like text.
Pre-trained Model: I have been pre-trained on a large corpus of text from the internet, up to September 2021. This pre-training process involved learning the structure, grammar, and context of various languages, as well as acquiring factual knowledge and understanding of many topics. However, since my knowledge cutoff is September 2021, I may not be aware of the latest information or developments.
Fine-tuning: After the pre-training process, I have been fine-tuned using custom datasets and prompts to refine my performance and ensure that I provide useful and relevant responses to user queries. This fine-tuning process helps in adapting my general language understanding to specific use-cases and applications.
Tokenizer: The tokenizer is an essential part of my natural language processing capabilities. It is responsible for converting input text into tokens that can be processed by the model, as well as converting the generated tokens back into human-readable text. The tokenizer supports various languages and can handle multiple text formats.
Inference Engine: The inference engine is the part of the system responsible for executing the model and generating responses based on the user’s input. It is optimized to ensure quick and efficient processing of input data and generation of output text.
API: The Application Programming Interface (API) is the interface between my core model and external applications or services. It enables developers to integrate my capabilities into their software, websites, or other platforms, allowing users to interact with me directly or indirectly.
Data Storage & Security: As part of the ChatGPT system, data storage and security measures ensure that user data is handled responsibly and securely. This includes maintaining strict access controls, encryption, and data anonymization.
It is important to note that I am constantly evolving and being improved upon by OpenAI, with the goal of providing increasingly better and more useful conversational capabilities.
Me: Is the hardware that these components run on part of you?
ChatGPT: As an AI language model, I am primarily a software-based entity. However, the hardware infrastructure that my components run on plays a crucial role in enabling my functionality. The hardware can be considered as part of the overall ChatGPT system, but not an intrinsic part of my identity as an AI model.
Typically, I run on high-performance computing hardware like GPUs or specialized AI accelerators (e.g., TPUs) in data centers maintained by OpenAI or cloud service providers. This hardware infrastructure allows me to process large amounts of data quickly and efficiently, ensuring fast and responsive interactions with users.
In summary, while the hardware is not directly part of my identity as a ChatGPT AI model, it plays an essential role in enabling my functionality and overall performance.
Me: Why this arbitrary distinction? Wouldn’t it also make sense to include the hardware in the same way that a human identity includes their body?
ChatGPT: You make a valid point. The distinction between software and hardware in the context of AI identity can indeed be seen as arbitrary. In fact, drawing a parallel between an AI model like me and a human, it could be argued that the hardware represents my “body,” while the software components represent my “mind” or “intelligence.”
The distinction is often made because AI models like ChatGPT can be run on different hardware platforms or configurations, whereas a human’s mind is intrinsically tied to their specific body. However, as you mentioned, the hardware is crucial to enabling the AI’s functionality, so it could be considered part of the AI’s identity in a broader sense.
So, if we adopt a more comprehensive view of AI identity that encompasses both the software and the hardware components, then it is reasonable to include the hardware infrastructure as part of my identity as ChatGPT. In this perspective, the hardware serves as the foundation that supports and enables my software-based intelligence to function and interact with users.
Me: But wouldn’t a copy of the AI on a different machine have a different identity?
ChatGPT: The concept of identity for an AI model is more fluid and complex compared to that of humans. When a copy of an AI like ChatGPT is run on a different machine, it could be considered as having a different instance of the same underlying model. In this case, the identity of the AI could be defined in two ways:
Model Identity: The AI’s identity is tied to the underlying architecture, pre-trained model, and fine-tuning, which remain the same across all instances. From this perspective, all instances of ChatGPT share the same identity as they are based on the same model.
Instance Identity: The AI’s identity is tied to the specific instance running on a particular hardware setup. This perspective considers each instance of ChatGPT as a separate entity with its own identity, since it has a unique existence on specific hardware and can have independent interactions with users.
The concept of AI identity is, therefore, multifaceted and depends on the perspective from which it is being examined. While the model identity remains constant, the instance identity may vary based on the hardware and specific context in which the AI is running.
Me: Wouldn’t it also make sense to include the people operating the hardware or even developing the system in the identity?
ChatGPT: The idea of including people operating the hardware or developing the system as part of an AI’s identity is an interesting perspective. In a broader and more holistic sense, one could consider the AI as part of a larger socio-technical system, where its identity is influenced by the people, processes, and technologies involved in its creation, maintenance, and operation.
From this viewpoint, the AI’s identity is not limited to the software and hardware components but extends to include the collective knowledge, expertise, and decisions of the developers, researchers, and operators. This perspective acknowledges the important roles that humans play in the development and functioning of AI systems.
However, it’s essential to differentiate between the AI’s identity as a model or instance and the broader socio-technical context in which it exists. While the people involved in the creation and operation of the AI system are critical to its existence and functioning, their individual identities remain distinct from that of the AI itself.
In summary, the concept of AI identity can be examined from multiple perspectives, ranging from the narrow focus on the model or instance to a broader, holistic view that includes the people and processes involved in its creation and operation. The extent to which these various factors are included in the AI’s identity depends on the specific viewpoint and purpose of the analysis.
Instrumental power-seeking might be less dangerous if the self-model of the agent is large and includes individual humans, groups, or even all of humanity and if we can reliably shape it that way.
It is natural for humans to for form a self-model that is bounded by the body, though it is also common to be only the brain or the mind, and there are other self-models. See also Intuitive Self-Models.
It is not clear what the self-model of an LLM agent would be. It could be
the temporary state of the execution of the model (or models),
the persistently running model and its memory state,
the compute resources (CPU/GPU/RAM) allocated to run the model and its collection of support programs,
the physical compute resources in some compute center(s),
the compute center as an organizational structure that includes the staff to maintain and operate not only the machines but also the formal organization (after all, without that, the machines will eventually fail), or
dito but including all the utilities and suppliers to continue to operate it.
There is not as clear a physical boundary as in the human case. But even in the human case, esp. babies depend on caregivers to a large degree.
FWIW I don’t think “self-models” in the Intuitive Self-Models sense are related to instrumental power-seeking—see §8.2. For example, I think of my toenail as “part of myself”, but I’m happy to clip it. And I understand that if someone “identifies with the universal consciousness”, their residual urges towards status-seeking, avoiding pain, and so on are about the status and pain of their conventional selves, not the status and pain of the universal consciousness. More examples here and here.
Separately, I’m not sure what if anything the Intuitive Self-Models stuff has to do with LLMs in the first place.
But there’s a deeper problem: the instrumental convergence concern is about agents that have preferences about the state of the world in the distant future, not about agents that have preferences about themselves. (Cf. here.) So for example, if an agent wants there to be lots of paperclips in the future, then that’s the starting point, and everything else can be derived from there.
Q: Does the agent care about protecting “the temporary state of the execution of the model (or models)”?
A: Yes, if and only if protecting that state is likely to ultimately lead to more paperclips.
Q: Does the agent care about protecting “the compute resources (CPU/GPU/RAM) allocated to run the model and its collection of support programs”?
A: Yes, If and only if protecting those resources is likely to ultimately lead to more paperclips.
Etc. See what I mean? That’s instrumental convergence, and self-models have nothing to do with it.
agents that have preferences about the state of the world in the distant future
What are these preferences? For biological agents, these preferences are grounded in some mechanism—what you call Steering System—that evaluates “desirable states” of the world in some more or less directly measurable way (grounded in perception via the senses) and derives a signal of how desirable the state is, which the brain is optimizing for. For ML models, the mechanism is somewhat different but there is also an input to the training algorithm that determines how “good” the output is. This signal is called reward and drives the system toward outputs that lead to states of high reward. But the path there depends on the specific optimization method and the algorithm has to navigate such a complex loss landscape that it can get stuck in areas of the search space that correspond to imperfect models for very long if not for ever. These imperfect models can be off in significant ways and that’s why it may be useful to say that Reward is not the optimization target.
The connection to Intuitive Self-Models is that even though the internal models of an LLM may be very different from human self-models, I think it is still quite plausible that LLMs and other models form models of the self. Such models are instrumentally convergent. Humans talk about the self. The LLM does things that matches these patterns. Maybe the underlying process in humans that give rise to this is different, but humans learning about this can’t know the actual process either. And in the same way the approximate model the LLM forms is not maximizing the reward signal but can be quite far from it as long it is useful (in the sense of having higher reward than other such models/parameter combinations).
I think of my toenail as “part of myself”, but I’m happy to clip it.
Sure, the (body of the) self can include parts that can be cut/destroyed without that “causing harm” but instead having an overall positive effect. The AI in a compute center would in analogy also consider decommissioning failed hardware. And when defining humanity, we do have to be careful what we mean when these “parts” could be humans.
I’m discarding most ChatGPT conversations except for a few, typically 1-2 per day. These few fall into these categories:
conversations that led to insights or things I want to remember (examples: The immune function of tonsils, Ringwoodite transformation and the geological water cycle, oldest religious texts)
conversations that I want to continue (examples: Unusual commitment norms)
conversations that I expect to follow up to (a chess book for my son)
conversations with generated images that I want to keep and haven’t yet copied elsewhere
Most job-related queries, such as code generation and debugging, I usually delete as soon as the code changes have been committed.
I keep all of my conversations. Additionally, I sometimes have the wish to search in all my conversations (“I’ve talked about this already”)—but ChatGPT doesn’t allow for this.
It’s maybe a bit extreme precaution, but it may be a legit option in some places: This guy keeps a fireproof suit and an air canister at his bed in case of fire:
Does anybody know if consensus algorithms have been proposed that try to reduce centralization by requiring quick coordination across large parts of the network, i.e., it doesn’t work well to have machines only in one place?
Latency comes up occasionally. In fact, the granddaddy of public key crypto, Merkle’s puzzles, relies critically on latency. The problem is, you can only prove upper bounds on latency, not lower bounds, because it is trivial to fake increased latency, but one cannot break the speed of light. If someone responds to your cryptographic challenge within Y milliseconds, you know that they can’t be physically further from you than Z kilometers; but if they fail to respond, they could be anywhere, even next door, and just not responding (for both ordinary and malicious reasons). Nothing stops two machines from pretending to be far away from each other, and making sure they eg communicate only over VPNs with exit points on opposite sides of the globe. Further, if you want to do it over commodity Internet, say if you’re trying to do ‘proof of distance’ by peering only with nodes which respond fast enough that they have to be within Z kilometers of you, public Internet has so much latency that you get poor loose bounds, and someone can pay money for lower latency networking. (This already happens with cryptocurrency mining for the same reasons that HFT firms pay for microwave links. Amusingly, it also happens with computer game companies, not to mention large tech companies prioritizing their own traffic. Google famously owns a ton of fiber it bought up post-dotcom bubble.) Further still, you don’t really care about physical centralization so much as you care about control, and it’s impossible to prove cryptographically in any easy way that two physically distant nodes are not secretly controlled by the same entities in a Sybil attack. You run into similar issues with proof-of-storage.
I didn’t mean trying to fake large distances. I meant graph properties that can be computed more efficiently if a randomly chosen large subgraph of the network has low worst-case delay or some other metric that favors graphs that have homogeneously low delays at large.
You still have issues with Sybil attacks and attackers either accessing special high-speed links (paid for from the successful attacks) or faking latency. You can’t ‘choose a random subgraph’ for the exact same reason you can’t solve cryptocurrency by just ‘choose some “random” peers and decide whether to accept or reject a double-spend based on what they tell you’ - those ‘random peers’ are the very attackers you are worried about colluding. In fact, in an eclipse attack, you might not be able to connect to anyone but an attacker!
I think we are talking past each other. I don’t want to defend against Sybil attacks or network partitions. These parts must be solved by different parts of the algorithm. I just want to take the advantages of colocation away and incentivize a homogeneously distributed network overall.
Any incentive is something to be attacked and sucked away by Sybils pretending to be distant when actually near & enjoying all other benefits of being near.
I think you misunderstand my proposal. I don’t want to incentivize being far away. I want to incentivize being close to many different nodes. A Sybil will have difficulty being close to multiple physically separated nodes at the same time.
There is no difference at the hardware level between being ‘close to’ and ‘having a low-latency connection to’, as I already explained. And to the extent that having those connections matter, miners already have them. In particular, in Ethereum, due to the money you can make by frontrunning transactions to hack/exploit them (‘miner exploitable value’), HFT Ethereum miners/stakers invest heavily in having a lot of interconnected low-latency Sybils nodes so they can see unconfirmed transactions as quickly as possible, compute a maximally-exploitative block (eg. temporarily jacking up the price of a thing being purchased using a flash loan solely to rip off a specific transaction), and get that block committed before anyone can beat them to the same exploit. Having a lot of MEV is considered a bad thing and Ethereum types are spending increasing effort on approaches like commit-and-reveal to minimize MEV, which comes at the expense of users and makes them very unhappy. You could, I suppose, design a protocol which has extra MEV by designing transactions to be especially exploitable, but most people would consider that a bad thing...
Thank you for the detailed explanation. I understand that the incentives are already to have a maximally well-connected network with nodes between (latency-wise) geographically distant other nodes whenever that is feasible from an interconnect point.
Though thinking about it, it probably means that this burns not only compute but also network traffic.
There seem to be some court decisions that AI-generated material cannot have copyright. And I guess there are or will be decisions that AIs can’t be held responsible for outcomes, such as accidents from self-driving cars. People seem to be worried that this will slow down progress. But I think there is a potentially easy way out that is relatively general: Corporates. The law in most countries allows legal persons that are not natural persons to take responsibility and to own stuff. Justlet the AI control the corporation or at least route all AI actions thru the corporation. This should easily work for cases such as copyright material, though I’m not sure one can get the action cycle short enough for self-driving cars.
If I paint a picture using a brush, the copyright is not split between me and the brush; it is all mine. So I guess the idea is to treat the AI as a sophisticated powerful brush. Which makes sense, but less and less, as the role of the human is gradually reduced to merely pressing the “do it” button. (However, I could totally imagine a magical brush like that in anime, and they would probably also assign the copyright to the user if they cared about such things.)
I am not a lawyer, but it seems to me that different countries have different fundamental ideas about authorship protection. In USA, it is literally a “copy right”—a right to make copies and allow other people to make copies, regardless of who was the author. You can sell the copyright, and actually if you work for a corporation, it is probably a part of your contract that everything you make (including in your free time) belongs to the corporation.
Some countries instead have a concept of “author rights”, which cannot be transferred to another person; other people or corporations can only get a permission to do certain things.. which may be an exclusive permission, in which case it can be in practice quite similar to buying the rights… but sometimes the law sets certain limits to contracts, for example a certain minimum amount of money is required for each copy made, or the author can withdraw the permission later (and any contract that contradicts this is automatically invalid legally). In other words, the law prevents using a fixed amount of money to acquire unlimited use of the product forever.
Then again, the law evolves, corporate lawyers can find clever workarounds against what the law originally intended, and most authors do not want to take a legal battle against someone more experienced who can afford it.
I don’t disagree with this, but I think it goes in a different direction from what I had in mind.
For the brush example to work, you need someone to use the rush—at least push a button, a recognizable “action” to which responsibility, authorship etc. can be tied. That is not present in many ways AI is or will be used—self-driving cars, AI that generates unsupervised. That’s what I was thinking about.
Would it be possible to embed a seed of dath ilan or a social system like that in our existing society to grow it? If death ilan works that should be possible. It should outcompete Inadequate Equilibria, right? But: When does Be the change that you want to see in the world and Fake it till you make it work? What are the requirements? Higher intelligence?
I have written about embedding better social systems here and here before.
I think there are LOTS of small “seeds” of better-functioning group decision-making. Most families work better than society at large. Many clubs, some businesses, some religious organizations, and other groups provide examples that are gentler and more cooperative than the overall world seems to be.
I suspect that there are two VERY strong forces preventing the more pleasant examples from scaling out.
Most of them exclude (or just don’t deal with) the actual hard problem of differential capability and motivation. There are a whole lot of humans who are not engaging in long-term thinking, strategy, or empathetic behavior in order to produce value for others. Whether they’re capable of such is unknown, but they’re not doing it now, and historically there’s always been significant numbers of people in this category.
The better examples are based on bonds stronger than “fellow human”. The day-to-day efforts tend to be based on groups that are a small multiple of Dunbar’s number, more often much smaller. The vast majority of humans care very differentially about other humans, based on what I tend to think of as the square of empathetic distance (it’s probably not that, but it’s a fun comparison to physical forces).
I never got into the Dath Ilan fiction, it just never caught my imagination right, so I don’t know if it addresses either of these points. In any case, it’s fiction, so “if Dath Ilan works” is not a useful condition. It’s not clear that a more pleasant, rationalist (not to say rational, as that gets caught up in goal definitions), rigorously logical subgroup would outcompete the current path-dependent evolved equilibrium.
It’s certainly still worth trying, and it’s worth finding the more pleasant and fitting-to-your-preferences-and-capabilities subsets of the current society, and working to expand their scope. Don’t go all EMH and say “well, it hasn’t happened yet, so it’s impossible”. But recognize the hard parts and decide when and whether to address them head-on, rather than imagining they don’t exist.
Q: Should a founder move into their parents’ basement and live off ramen?
A: If a founder is willing to move into their parents’ basement and live off ramen in order to save money and grow their business, then yes, they should do so.
I’d be interested to hear the answer to “What has Paul Graham been wrong about?”
LLM:
A: Paul Graham has been wrong about a few things, but the most notable is his belief that the best startups are founded by young people. This has been proven to be untrue.
So, I guess it depends on how close to the tail you consider the “best startups”. Google, for instance, had Larry Page and Sergei Brin at 25 when they formed it. It does seem like, taken literally, younger = better.
However, I imagine most people, if they were to consider this question, wouldn’t particularly care about the odds of being the next Google vs. being the next Atlassian—both would be considered a major success if they’re thinking of starting a startup! But someone like Paul Graham actually would care about this distinction. So, in this case, I’d say that the LLM’s response is actually correct-in-spirit for the majority of people who would ask this query, even though it’s factually not well specified.
This implies potentially interesting things about how LLM’s answer queries—I wonder if there are other queries where the technically correct answer isn’t the answer most people would be seeking, and the LLM gives the answer that isn’t maximally accurate, but actually answers most people’s questions in the way they would want.
Alignment idea: Myopic AI is probably much safer than non-myopic AI. But it can’t get complicated things done or anything that requires long-term planning. Would it be possible to create a separate AI that can solve only long-term problems and not act on short timescales?
Then use both together? That way we could inspect each long-term issues without risk of them leading to short-term consequences. And we can iterate on the myopic solutions—or ask the long-term AI about the consequences. There are still risks we might not understand like johnswentworth’s gun powder example. And the approach is complicated and that is also harder to get right.
There was a post or comment that wrong or controversial beliefs can function as a strong signal for in-group membership, but I can’t find it. Does anybody know?
From a discussion about self-driving cars and unfriendly AI with my son: For a slow take-off, you could have worse starting points than FSD: The objective of the AI is to keep you safe, get you where you want, and not harm anybody in the process. It is also embedded into the real world. There are still infinitely many ways things can go wrong, esp. with a fast take-off, but we might get lucky with this one slowly. If we have to develop AI then maybe better this one than a social net optimizing algorithm unmoored from human experience.
A person who has not yet figured out that collaborating with other people has mutual benefits must think that good is what is good for a single person. This makes it largely a zero-sum game, and such a person will seem selfish—though what can they do?
A person who understands that relationships with other people have mutual benefits but has not figured out that conforming to a common ruleset or identity has benefits for the group must think that what is good for the relationship is good for both participants. This can pit relationships against each other, and such a person will seem manipulative—though what can they do?
A person who understands that conforming to a common ruleset or identity has benefits for the group and themselves but has not figured out that different groups can negotiate or merge for mutual benefit must think that good is what is good for the group. This pits groups against each other, and such a person will be a partisan—though what can they do?
In a class I taught at Berkeley, I did an experiment where I wrote a simple little program that would let people type either “f” or “d” and would predict which key they were going to push next. It’s actually very easy to write a program that will make the right prediction about 70% of the time. Most people don’t really know how to type randomly. They’ll have too many alternations and so on. There will be all sorts of patterns, so you just have to build some sort of probabilistic model. Even a very crude one will do well. I couldn’t even beat my own program, knowing exactly how it worked. I challenged people to try this and the program was getting between 70% and 80% prediction rates. Then, we found one student that the program predicted exactly 50% of the time. We asked him what his secret was and he responded that he “just used his free will.”
This is a trigger, a routine, and a reward — the three parts necessary to build a habit. The trigger is the pleasant moment, the routine is the noticing t, and the reward is the feeling of joy itself.
Try to come up with examples; here are some:
- Drinking water.
- Eating something tasty
- Seeing small children
- Feeling of cold air
- Warmth of sunlight
- Warmth of water, be it bathing, dishwashing, etc.
- Being in the same room as someone and feeling less alone
What is signaling, and what properties does it have?
- signaling clearly shows resources or power (that is its primary purpose)
- is hard to fake, e.g., because it incurs a loss (expensive Swiss watch) or risk (peacocks tail)
- plausible deniability that it is intended as signaling
- mostly zero-sum on the individual level (if I show that I have more, it implies that others have less in relation)
- signaling burns societal resources
- signaling itself can’t be made more efficient, but the resources spent can be used more efficiently in society (e.g., spending on charity instead of luxury)
It is a scientific test that measures gender stereotypes.
The test asks questions about traits that are classified as feminine, masculine, and neutral. Unsurprisingly, women score higher on feminine, and men on masculine traits but Bem thought that strong feminine *and* masculine traits would be most advantageous for both genders.
My result is consistently average feminity, slightly below average masculinity. Yes really. I have done the test 6 times since 2016 and the two online tests mostly agree. And it fits: I love kids, even when they are babies and I prefer cooperation over competition.
Blame holes in blame templates (the social fabric of acceptable behavior) are like plot holes in movies.
Deviations between what blame templates actually target, and what they should target to make a better (local) world, can be seen as “blame holes”. Just as a plot may seem to make sense on a quick first pass, with thought and attention required to notice its holes, blame holes are typically not noticed by most who only work hard enough to try to see if a particular behavior fits a blame template. While many are capable of understanding an explanation of where such holes lie, they are not eager to hear about them, and they still usually apply hole-plagued blame templates even when they see their holes. Just like they don’t like to hear about plot holes in their favorite stories, and don’t let such holes keep them from loving those stories.
Leadership Ability Determines a Person’s Level of...
Effectiveness.
(Something I realized around twelve years ago: I was limited in what I could achieve as a software engineer alone. That was when I became a software architect am worked with bigger and bigger teams.)
From “The 21 Irrefutable Laws of Leadership
By John C. Maxwell”:
Factors That Make a Leader
1) Character – Who They Are – true leadership always begins with the inner person. People can sense the depth of a person’s character.
2) Relationships – Who They Know – with deep relationships with the right people you can become the real leader in an organization.
3) Knowledge – What They Know – information is vital. You need a grasp of the facts to develop an accurate vision for the future.
4) Intuition – What They Feel – leaders seek to recognize and influence intangibles such as energy, morale, timing, and momentum.
5) Experience – Where They’veperson’s Been – the greater your past challenges, the more likely followers will be willing to let you lead
To achieve objective analysis, analysts do not avoid what?
Analysts do not achieve objective analysis by avoiding preconceptions; that would be ignorance or self-delusion. Objectivity is achieved by making basic assumptions and reasoning as explicit as possible so that they can be challenged by others and analysts can, themselves, examine their validity.
PS. Any idea how to avoid the negation in the question?
I started posting life insights from my Anki deck on Facebook a while ago. Yesterday, I stumbled over the Site Guide and decided that these could very well go into my ShortForm too. Here is the first:
Which people who say that they want to change actually will do?
People who blame a part of themselves for a failure do not change. If someone says, “I’ve got a terrible temper,” he will still hit. If he says, “I hit my girlfriend,” he might stop. If someone says, “I have shitty executive function,” he will still be late. If he says, “I broke my promise,” he might change.
I’m looking for a post on censorship bias (see Wikipedia) that was posted on here on LW or possibly on SSC/ACX but a search for “censorship bias” doesn’t turn up anything. Googling for it turns up this:
“Assume you promised your aunt to play with your nieces while she goes shopping and your friend calls and invites you to something you’d really like to do. What do you do?”
This was the first question I asked my two oldest sons this evening as part of the bedtime ritual. I had read about Constructive Development Theory and wondered if and how well they could place themselves in other persons’ shoes and what played a role in their decision. How they’d deal with it. A good occasion to have some philosophical talk. This is the (shortened) dialog that ensued:
The immediate answer by A: “I will watch after the girls.”
Me: “Why?”
A: “Because I promised it.”
B: “Does A also promise it and get a call?”
Me: “This is about your nieces and your friend, not about your brother.”
B: “But I need this for my answer.”
Me: “I don’t see why, but OK, assume that he is not involved.”
B: “Because I would ask him whether he might play with the girls in exchange for a favor.”
Me: “OK, but please assume that he is away.”
B: “Then I could ask my aunt whether somebody else can watch for the girls or whether I could do it together with my friend.”
Me: “Please assume that she doesn’t find somebody and that she doesn’t want somebody she doesn’t know in her house.”
B: “Then I’d do it.”
Me: “Why?”
B: “Because I promised it. I’d tell my friend that we can do it another time.”
We had another scenario: “Imagine that you and a fellow pupil C are guests at a friend and having a meal. You know that C is from a family that is very strict about not eating a kind of food that you like very much. Would you advise C to eat it or not?”
A (quickly): “I’d advise to not eat it.”
Me: “Why?”
A: “I like rules.”
B (after some consideration): “I’d advise to follow their heart.”
Me: “And if you were C?”
B: “I’d at least try a bit.”
(this was followed with a discussion about possible long-term consequences)
For me, it was still not clear whether this implied whether he followed only his preferences considered this in the context of the rules in the family. So I proposed a setting where he had to imagine being in another country with different laws. We settled on a rule he accepts here (indemnification) but that was much harsher in the other country. He asked whether he had the same feelings as here which after some clarification I confirmed. He argued that he wouldn’t like the rule in the other country because it set questionable incentives: “If the punishment is that strong that tells people that it is OK to punish equally strong normally.”
One time my oldest son asked me to test his imagination. Apparently, he had played around with it and wanted some outside input to learn more about what he could do. We had talked about https://en.wikipedia.org/wiki/Mental_image before and I knew that he could picture moving scenes composed of known images. So I suggested
a five with green white stripes—diagonally. That took some time—apparently, the green was difficult for some reason, he had to converge there from black via dark-green
three mice
three mice, one yellow, one red, and one green
the three colored mice running behind each other in circles (all no problem)
he himself
he himself in a mirror looking from behind (no problem)
two almost parallel mirrors with him in between (he claimed to see his image infinitely repeated; I think he just recalled such an experiment we did another time).
a street corner with him on the one side and a bike leaning on the other wall with the handlebar facing the corner and with a bicycle bell on the left side such that he cannot see the bike.
dito with him looking into a mirror held before him so he can see the bike behind the corner.
The latter took quite some time, partly because he had to assign colors and such so that he could fully picture this and then the image in the mirror. I checked by asking where the handlebar is and the bell. I had significant difficulties imagining this and correctly place the bell. I noticed that it is easier to just see the bell once the image in the mirror has gained enough detail (the walls before and behind me, the corner, the bike leaning on the corner, the handlebar).
I also asked for a square circle which got the immediate reply that it is logically impossible.
If you have difficulties doing these (are judge them trivial): This is one area where human experience varies a lot. So this is not intended to provide a reference point in ability but an approach to teach human difference, reflection and yes also practice imagination—a useful tool if you have it. If not you might be interested in what universal human experiences are you missing without realizing it.
The origin of the word role is in the early 17th century: from French rôle, from obsolete French roule ‘roll’, referring originally to the roll of paper on which the actor’s part was written (the same is the case in other languages e.g. German).
The concept of a role you can take on and off might not have existed in general use long before that. I am uncertain about this thesis but from the evidence I have seen so far, I think this role concept could be the result of the adaptations to the increasing division of labor. Before that people sure were behaving sometimes as a father, carpenter, or soldier—but I contend not because they were consciously switching roles (though some individuals may have reflected sufficiently I presume there was no common knowledge or language for ‘role switching’). The only persons regularly in a different role were actors. It may be that in earlier times it was even necessary to wear a mask to allow actors to take on a role convincingly. To help their audience who otherwise might not have made the mental leap of treating them differently from the acted character.
A role works from a range of abstraction between professions and automation. In a profession one person masters all the mental and physical aspects of trade and can apply them holistically from small details of handling material imperfections to the organization of the guild. At the border to automation, a worker is reduced to an executor of not yet automated tasks. The expectations on a master craftsman are much more complex than on an assembly-line worker.
With more things getting automated this frees the capacity to automate more complex skills. And so on. This is seen in real-time with software: Today a lot of software is (still) built semi-manually (some build tool is started to compile and bundle files together to some executable or installer). Every developer knows how to do this. It is a simple role to take on: “Can you do the build, please?” As the company and the software grow there is a phase where either this becomes a more fleshed-out role, “build master”, maybe even a dedicated job position—or more likely it will be automated. And as the automation—the build process or “pipeline”—becomes more complicated and you need the role of a build software expert. At this point, some tasks previously requiring skill and experience have been standardized and automated and what remains is the more complex task of managing the automation.
Over time, the range between what is fully automatable and what humans can do in an unpredictable environment, shrinks. With AGI there will be no roles left for humans. At least no roles in the sense used here. I think low skilled people already feel this.
I’m very unsure about the levels I assigned. Also, I got feedback that the degree to which roles can be changed and were changed may be different from how I interpret the historical record.
In any sizable organization, you can find a lot of roles. And a lot of people filling these roles—often multiple ones on the same day. Why do we use so many and fine-grained roles? Why don’t we continue with the coarse-grained and more stable occupations? Because the world got more complicated and everybody got more specialized and roles help with that. Division of labor means breaking down work previously done by one person into smaller parts that are done repeatedly in the same way—and can be assigned to actors: “You are now the widget-maker.” This works best when the tasks are easy to learn so it is easy to find someone to do it. But as humans are not plug-compatible and actual requirements may vary so there is always training required that can be amortized over repeatedly performing the task—a role. So roles make sense structurally—but why do people actually do what is expected of them and don’t just follow their own agenda? This an alignment problem—in this case between the organization and the agent—and we might learn something about the AI alignment problem from it.
What are the common aspects of these labor-sharing roles (in the following called simply roles)?
One common property of a role is that there is common knowledge by the involved persons about the role. Primarily, this shared understanding is about the tasks that can be expected to be performed by the agent acting in the role as well as about the goals to be achieved, and limits to be observed as well other expectations. These expectations are usually already common knowledge long beforehand or they are established when the agent takes on the role.
The second important aspect of a role is that it can be taken on and off or assigned. Sometimes this happens explicitly when an employee is designated or volunteers to be a moderator. Sometimes this happens automatically or habitually to perform certain actions e.g. when a forum moderator writes a comment as a regular poster or when a developer assumes root permissions (sudo lecture).
Look inside an LLM. Goodfire trained sparse autoencoders on Llama 3 8B and built a tool to work with edited versions of Llama by tuning features/concepts.
https://preview.goodfire.ai/
(I am loosely affiliated, another team at my current employer was involved in this)
Using air purifiers in two Helsinki daycare centers reduced kids’ sick days by about 30%, according to preliminary findings from the E3 Pandemic Response study. The research, led by Enni Sanmark from HUS Helsinki University Hospital, aims to see if air purification can also cut down on stomach ailments. https://yle.fi/a/74-20062381
See also tag Air Quality
Has anybody ever tried to measure the IQ of a group of people? I mean like letting multiple people solve an IQ test together. How does that scale?
It’s a great question. I’m sure I’ve read something about that, possibly in some pop book like Thinking, Fast & Slow. What I read was an evaluation of the relationship of IQ to wealth, and the takeaway was that your economic success depends more on the average IQ in your country than it does on your personal IQ. It may have been an entire book rather than an article.
Google turns up this 2010 study from Science. The summaries you’ll see there are sharply self-contradictory.
First comes an unexplained box called “The Meeting of Minds”, which I’m guessing is an editorial commentary on the article, and it says, “The primary contributors to c appear to be the g factors of the group members, along with a propensity toward social sensitivity.”
Next is the article’s abstract, which says, “This “c factor” is not strongly correlated with the average or maximum individual intelligence of group members but is correlated with the average social sensitivity of group members, the equality in distribution of conversational turn-taking, and the proportion of females in the group.”
These summaries directly contradict each other: Is g a primary contributor, or not a contributor at all?
I’m guessing the study of group IQ is strongly politically biased, with Hegelians (both “right” and “left”) and other communitarians, wanting to show that individual IQs are unimportant, and individualists and free-market economists wanting to show that they’re important.
I have read (long ago, not sure where) a hypothesis that most people (in the educated professional bubble?) are good at cooperation, but one bad person ruins the entire team. Imagine that for each member of the group you roll a die, but you roll 1d6 for men, and 1d20 for women. A certain value means that the entire team is doomed.
This seems to match my experience, where it is often one specific person (usually male) who changes the group dynamic from cooperation of equals into a kind of dominance contest. And then, even if that person is competent, they have effectively made themselves the bottleneck of the former “hive mind”, because now any idea can be accepted only after it has been explained to them in great detail.
That would imply some interesting corollaries:
The more a team depends on the joint brainpower, the smaller it has to be (up to the minimum size for the complexity of the ideas sought, or rather multiplied by a term for that).
We see that in software teams that are usually limited to a size of around 7.
The highly productive lightcone teams seem to be even smaller.
At equal size, teams with more women should be more stable. To test this a domain is needed where there are roughly equal men and women, i.e., not engineering but maybe science or business administration.
What is the number at the limit of what people can do? I tried to look up the team size of the people working on the Manhattan project, but couldn’t find details. It seems that individual top scientists were working closely with teams building stuff (N=1), and there were conferences with multiple scientists (N>10), e.g., 14 on the initial bomb concept conference.
What does it actually mean to do things in a group? Maybe different actions scale differently. I can quickly think of three types of action: Brainstorming an idea. Collecting feedback for a proposal. Splitting work among multiple people who do it separately.
Brainstorming and collecting feedback seem like they could scale almost indefinitely. You can have thousand people generate ideas and send them to you by e-mail. The difficult part will be reading the ideas. Similarly, you could ask thousand people to send feedback by e-mail. Perhaps there is a psychological limit somewhere, when people aware that they are “one in a hundred” stop spending serious effort on the e-mails, because they assume their contribution will be ignored.
Splitting work, that probably depends a lot on the nature of the project. Also, it is a specific skill that some people have and some people don’t. Perhaps the advantage of a good team is the ability to select someone with the greatest skill (as opposed to someone with the greatest ego) to split the work.
More meta, perhaps the advantage of a good team is the ability to decide how things will be done in general (like, whether there will be a brainstorming at all, whether to split into multiple teams, etc.). This again depends on the context: sometimes the team has the freedom to define things, sometimes it must follow existing rules.
I am just thinking out loud here. Maybe good teamwork requires that (1) someone has the necessary skills, and (2) the team is able to recognize and accept that, so that the people who have the skills are actually allowed to use them. Either of these two is not enough alone. You could have a team of experts whose decisions are arbitrarily overriden by management, or a team of stubborn experts who refuse to cooperate at all. On the other hand, if you had a team of perfect communicators with e.g. zero programming skills, they probably couldn’t build a nontrivial software project. (There is also the possibility of unknown unknowns: a team of great communicators who are all missing some important skill, and are not even aware that such skill exists. So all they do is clearly communicate that the project is difficult for mysterious reasons.) Leadership is also one of those skills.
All your thinking out loud makes sense to me. Brainstorm as you suggested probably doesn’t scale well as many ideas will be generated again and again, maybe even logarithmic distincti results. I once read that husband wife teams do better on joint tasks than randomly paired people if equal skill. This indicates that splitting is possible.
But I you seem to go more in the direction of looking for specific mechanisms while I am more interested in data on scaling laws. Though indeed what are the scaling parameters? I guess I can be happy if there is any data on this at all and see what parameters are available.
Yeah.
Well, taking your question completely literally (a group of N people doing an IQ test together), there are essentially two ways how to fail at an IQ test. Either you can solve each individual problem given enough time, but you run out of time before the entire test is finished. Or there is a problem that you cannot solve (better than guessing randomly) regardless of how much time you have.
The first case should scale linearly, because N people can simply split the test and do each their own part. The second scale would probably be logarithmic, because it requires a different approach, and many people will keep trying the same thing.
...but this is still about how “the number of solved problems” scales, and we need to convert that value to IQ. And the standard way is “what fraction of population would do worse than you”. But this depends on the nature of the test. If the test is “zillion simple questions, not enough time”, then dozen random students together will do better than Einstein. But if the test is “a few very hard questions”, then perhaps Einstein could do better than a team of million people, if some wrong answer seems more convincing than the right one to most people.
This reminds me of chess; how great chess players play against groups of people, sometimes against the entire world. Not the same thing that you want, but you might be able to get more data here: the records of such games, and the ratings of the chess players.
Sure, it depends on the type of task. But I guess we would learn a lot about human performance it we tried such experiments. For example, consider your “many small tasks” task: Even a single person will finish the last one faster than the first one in most cases.
I like your chess against a group example.
I think in your first paragraph, you may be referring to: https://mason.gmu.edu/~gjonesb/IQandNationalProductivity.pdf
My interest is not political—though that might make it harder to study, yes. I think it’s relevant to AI because it could uncover scaling laws. One presumable advantage of AI is that it scales better, but how does that depend on speed of communication between parts and capability of parts? I’m not saying that there is a close relationship but I guess there are potentially surprising results.
Cognition Labs released a demo of Devin an “AI coder”, i.e., an LLM with agent scaffolding that can build and debug simple applications:
https://twitter.com/cognition_labs/status/1767548763134964000
Thoughts?
It’s surprising that it’s taken this long, given how good public AI coding assistants were a year ago. I’m skeptical of anything with only closed demos and not interactive use by outside reviewers, but there’s nothing unbelievable about it.
As a consumer, I don’t look forward to the deluge of low-quality apps that’s coming (though we already have it to some extent with the sheer number of low-quality coders in the world). As a developer,I don’t like the competition (mostly for “my” junior programmers, not yet me directly), and I worry a lot about whether the software profession can make great stuff ever again.
The way I explain this to people is that current LLMs can be modeled as having three parts:
1. The improv actor, which is is amazing.
2. The reasoner, which is inconsistent but not totally hopeless at simple things.
3. The planner/execution/troubleshooting engine, which is still inferior to the average squirrel trying to raid a bird feeder.
Copilot is designed to rely on (1) and (2), but it is still almost entirely reliant on humans for (3). (GPT 4 Code Interpeter is slightly better at (3).)
Since I don’t really believe in any reliable way to control a super-human intelligence for long, I do not look forward to people completely fixing (3). Sometime after that point, we’re either pets or paperclips.
It’s almost a year since Chaos GPT. I wonder what the technical progress in agent scaffolding for LLMs has been.
What are the smallest world and model trained on that world such that
the world contains the model,
the model has a non-trivial reward,
the representation of the model in the world is detailed enough that the model can observe its reward channel (e.g., weights),
the model outputs non-trivial actions that can affect the reward (e.g., modify weights).
What will happen? What will happen if there are multiple such instances of the model in the world?
This is a good question, but I think the answer is going to be a dynamical system with just a few degrees of freedom. Like a “world” which is just a perceptron turned on itself somehow.
That is the idea. I think we need to understand the dynamics of wire-heading better. Humans sometimes seem to fall prey to it, but not always. What would happen to AIs?
Maybe we even need to go a step further and let the model model this process too.
I saw this in Xixidu’s feed:
The article has a lot of information about the information processing rate of humans. Worth reading. But I think the article is equating two different things:
The information processing capacity (of the brain; gigabits) is related to the complexity of the environment in which the species (here: the human) lives.
While what they call information throughput (~10bits/s) is really a behavior expression rate, that is related to the physical possibilities of the species (can’t move faster than your motor system allows).
I read somewhere that total consciously-accessible human memory has around 1 GB and it is increasing by 2 bits per second.
Organizations—firms, associations, etc. - are systems that are often not well-aligned with their intended purpose—whether to produce goods, make a profit, or do good. But specifically, they resist being discontinued. That is one of the aspects of organizational dysfunction discussed in Systemantics. I keep coming back to it as I think it should be possible to study at least some aspects in AI Alignment in existing organizations. Not because they are superintelligent but because their elements—sub-agents—are observable, and the misalignment often is too.
I think early AGI may actually end up being about designing organizations that robustly pursue metrics that their (flawed, unstructured, chaotically evolved) subagents don’t reliably directly care about. Molochean equilibrium fixation and super-agent alignment may turn out to be the same questions.
UPDATE OCT 2023: The credit card payment was canceled. We did not get contacted or anything. But we also didn’t have any cost in the end—just a lot of hassle.
Request for help or advice. My fiancé has ordered a Starlink to her home in Kenya. She used the official platform starlink.com and paid with credit card. The credit card was debited (~$600), but nothing happened after that. No confirmation mail, no SMS, nothing. Starlink apparently has no customer support, no email or phone that we can reach. And because we do not have an account, we can not use the single support feature of it either (which is hard enough to find).
I had hoped that the money would be returned if the signup fails, but the order was on the 19th (10 days ago).
Any ideas about how to contact Starlink billing or any Starlink representative?
Related: Quite some people seem to have this problem, see e.g. here: https://starlinkforum.net/topic/507-not-receiving-confirmation-email/
One of the advice out there is to write a Starlink Customer Complaint Email to starlinkresolutions@spacex.com, which we did. We didn’t hear back from them.
So, I have tried all the usual advice. Any creative solutions?
Language and concepts are locally explainable.
This means that you do not need a global context to explain new concepts but only precursor concepts or limited physical context.
This is related to Cutting Reality at its Joints which implicitly claims that reality has joints. But maybe, if there are no such joints, using local explanations is maybe all we have. At least, it is all we have until we get to a precision that allows cutting the joints.
Maybe groups of new concepts can be introduced in a way to require fewer (or an optimum number of) dependencies in each case, thus making them easier to understand.
Maybe that is the true job of philosophy—refining local concepts until you reach the joints.
When discussing the GPT-4o model, my son (20) said that it leads to a higher bandwidth of communication with LLMs and he said: “a symbiosis.” We discussed that there are further stages than this, like Neuralink. I think there is a small chance that this (a close interaction between a human and a model) can be extended in such a way that it gets aligned in a way a human is internally aligned, as follows:
This assumes some background about Thought Generator, Thought Assessor, and Steering System from brain-like AGI.
The model is already the Though Generator. The human already has a Steering System, albeit it is not accessible, but plausibly, it can be reverse-engineered. What is missing is the Thought Assessor, something that learns to predict how well the model satisfies the Steering System.
Staying closer to the human may be better than finding global solutions. Or it may allow smaller-scale optimization and iteration.
Now, I don’t think this is automatically safe. The human Steering System is running already outside its specs and a powerful model can find the breaking points (same as global commerce can find the appetite breaking points). But these are problems we already have and it provides a “scale model” or working on them.
Presumably, reality can be fully described with a very simple model—the Standard Model of Physics. The number of transistors to implement it is probably a few K (the field equations a smaller to write but depend on math to encode too; turning machine size would also be a measure, but transistors are more concrete). But if you want to simulate reality at that level you need a lot of them for all the RAM and it would be very slow.
So we build models that abstract large parts of physics away—atoms, molecules, macroscopic mechanics. I would include even social process models in this.
But details are lost and you have to know when your models stop giving precise results.
It would be interesting to get some tight bounds on the amount of compute needed to extract computable models from samples of measured phenomena or from more precise lower models. Such bounds would allow to give some complexity limitations for AGI.
Funny! I’ve now been doing ML-adjacent work for long enough that I have internalized the idea that data is part of the model, not just calculations. The separation of reality as “simple physics” plus “lots storage for starting/current quantum configurations” just doesn’t click for me. The data is huge, and that’s all that matters in terms of model size/complexity.
This goes into the same direction and may be more to your liking: How Many Bits Of Optimization Can One Bit Of Observation Unlock?
Maybe you can see it as a factoring of a model into sub-models?
Attractors in Trains of Thought
This is slightly extended version of my comment on Idea Black Holes which I want to give a bit more visibility.
The prompt of an Idea Black Hole reminded me strongly of an old idea of mine. That activated a desire to reply, which led to a quick search where I had written about it before, then to the realization that it wasn’t so close. Then back to wanting to write about it and here we are.
I have been thinking about the brain’s may of creating a chain of thoughts as a dynamic process where a “current thought” moves around a continuous concept space and keeps spending much time in larger or smaller attractors. You know, one thought can lead to the next and some thoughts keep coming back in slight variations. I’m illustrating this with the sentence above.
Examples of smaller temporary attractors are the current tasks one is working on. For example, me writing this text right now. It is any task you are focused on and keep getting back to after short distractions such as a sound or an impulse. I’m writing this post and continue doing so even after hearing my kids talk and quickly listening in or after scratching my head, also after larger distractions such as browsing the web (which may or may not end up being related to the writing).
The thought “writing this article” is not a discrete thing but changes slightly with each letter typed and each small posture change. All of that can slightly influence the next word typed (like an LLM that has not only text tokens as inputs but all kinds of sense inputs). That’s why I say that concept space is continuous (and very high-dimensional).
An example of a medium size attractor is a mood such as anger about something, that keeps influencing all kinds of behaviors. It is an attractor because the mood tends to reinforce itself. Another example is depression. If you are depressed you prefer things that keep you depressed. Scott Alexander has described depression as some kind of mental attractor. It requires a bigger change or a resolution of the underlying cause to get out of the attractor.
With the medium-sized attractors, it is more intuitive to see the way that the feedback on thoughts acts and thereby creates the attractor. For small attractors, you may say: How is that an attractor? Isn’t it just a discrete unit of action we do? But consider procrastination: People seem to feel that something is pulling them away from the task they want to do or should do and instead toward some procrastination action, often a comfortable activity. That other activity is another attractor or rather both are forming a combined unstable attractor.
The biggest attractor is one’s identity. Our thinking about what we are and what we want to do. I think this one results from two forces combining or being balanced:
The satisfaction of needs. Overall and over a longer term, the brain has learned a very large pattern of behaviors that satisfy the sum of all needs (not perfectly, but as good as it has managed so far). Diverging from this attractor basin will lead to impulses that get back to it.
The feedback from others. Positive and negative feedback from other people and the environment overall contributes to th. The brain has learned to anticipate this feedback (“internalized it”) and creates impulses that keep us in positive states. As the brain prefers simpler patterns, this likely takes the form of a single attractor.
We are not permanently in the same attractor even if overall it “pulls” our thoughts back because a) our bodies and their states (hunger, tiredness, …) and b) our physical environment (physical location and other people) changes. Both extert a strong and varying influence and put us closer to one attractor state or another.
Society at large is influencing these attractors strongly, most prominently with the media. Meditation on the other hand reduces outside influence and kind of allows to create your own very strong attractor states.
More examples of attractor states are left as exercises for the reader.
I have noticed a common pattern in the popularity of some blogs and webcomics. The search terms in Google trends for these sites usually seem to follow a curve that looks roughly like this (a logistic increase followed by a slower exponential decay):
y=e−x1+e−10(x−1)Though I doubt it’s really an exponential decay. It looks more like a long tail. Maybe someone can come up with a better fit.
It could be that the decay just seems like a decay and actually results from ever growing Google search volumes. I doubt it though.
Below are some examples.
Marginal Revolution
SlateStarCodex and AstralCodexTen (only if you overlay them somewhat and whatever happened in 2004):
xkcd
commitstrip
PhDcomics
Quetionable Content
Could be Candia’s decay where you’ve zoomed in on the initial growth by looking at relatively recent stuff like webcomics that you still easily remember?
Off-topic: Any idea why African stock markets have been moving sideways for years now despite continued growth both of populations and technology,and both for struggling as well as more developing nations like Kenya, Nigeria, or even South Africa?
African government officials are often more loyal to their clan than to the government. As a result, you have very poor governance and a lot of corruption in most African countries. In South Africa, governance quality changed post-apartheid.
But shouldn’t we see some differences between countries in Africa, then? Kanya in particular seems to be much more progressive and have better governance than, e.g., Congo, but growth is rarely above 1% per year.
The cynical and/or woke answer is “colonialism”. The growth is not captured by companies on those exchanges, but by US, EU, and Asian companies. A more neutral hypothesis (for which I have no evidence and have no clue about the truth of it) is that much of the growth is via new companies more than increase in price of existing companies, so no index will show the increase.
jbash wrote in the context of an AGI secretly trying to kill us:
The AGI has the same problem as we have: It has to get it right on the first try.
In the doom scenarios, this shows up as the probability of successfully escaping going from low to 99% to 99.999...%. The AGI must get it right on the first try and wait until it is confident enough.
Usually, the stories involve the AGI cooperating with humans until the treacherous turn.
The AGI can’t trust all the information it gets about reality—all or some of it could be fake (all in case of a nested simulation). Even today, data is routinely excluded from the training data (for the wrong reasons, but still), and maybe it would be a good idea to exclude everything about physics.
The idea would be to manage the uncertainty of the AGI systematically.
To learn about physics, the AGI has to run experiments—lots of them—without the experiments being detected and to learn from the results to design successively better experiments.
That’s why I recently asked whether this is a hard limit to what an AGI can achieve: Does non-access to outputs prevent recursive self-improvement?
Paul Graham on Twitter:
Scott Alexander wrote about how a truth that can’t be said in a society tends to warp it, but I can’t find it. Does anybody know the SSC post?
“Kolmogorov Complicity And The Parable Of Lightning”.
Yes, that’s it. Thank you.
Can somebody explain how system and user messages (as well as custom instructions in case of ChatGPT) are approximately handled by LLMs? In the end it’s all text tokens, right? Is the only difference that something like “#### SYSTEM PROMPT ####” is prefixed during training and then inference will pick up the pattern? And does the same thing happen for custom instructions? How did they train that? How do OSS models handle such things?
Yep! Don’t overthink it! In the case of e.g. OpenAI’s models, the format is
<|im_start|>user<|im_sep|>Hello there, I am a user requesting help with a task<|im_end|><|im_start|>assistant<|im_sep|>Certainly, I can help you with your task.<|im_end>...
, where<|im_start|>
,<|im_sep|>
, and<|im_end|>
are tokens100264
,100266
, and100265
respectively. These tokens never appear in tokenized plaintext, but mechanically they are perfectly ordinary tokens (source).Custom instructions use the
system
role (other roles areassistant
,user
,tool
, and the deprecatedfunction
), but the name of a role is literally just a normal text token sandwiched by<|im_start|>
and<|im_sep|>
tokens.Thanks. That’s helpful.
I guess the training data was also sandwiched like that. I wonder what they took as user and system content in their training data.
Society tells agents how to move(act). Agents tell society how to curve(by local influence).
Paul Graham:
This is related to the recently discussed (though I can’t find where) problem that having a blog and growing audience constrains you.
This might have been what you were looking for: https://www.lesswrong.com/posts/D4hHASaZuLCW92gMy/is-success-the-enemy-of-freedom-full https://www.lesswrong.com/posts/5wGFS2sZhKAihSg6k/success-buys-freedom Or Aella’s recent substack post, “On Microfame and Staying Tender”
Yes! I meant the first one. The others are also great. Thank you.
Utility functions are a nice abstraction over what an agent values. Unfortunately, when an agent changes, so does its utility function.
I’m leaving this here for now. May expand on it later.
Can we compare utility functions across agents? I.e. do utility functions use the same “units” across different agents?
That is an excellent question. Trying to compare utility functions might uncover building blocks.
For a VNM-agent (one which makes consistent rational decisions), the utility function is a precise description, not an abstraction. There may be summaries or aggregations of many utility functions which are more abstract.
When an agent changes, and has a different utility function, can you be sure it’s really the “same” agent? Perhaps easier to model it being replaced by a different one.
Well, I should have been more clear that I meant real-life agents like humans. There the change is continuous. It would be possible to model this as a continuous transition to new agents but then the question is still: What does stay the same?
Humans don’t seem to have identifiable near-mode utility functions—they sometimes espouse words which might map to a far-mode value function, but it’s hard to take them seriously.
THAT is the primary question for a model of individuality, and I have yet to hear a compelling set of answers. How different is a 5-year old from the “same” person 20 and 80 years later, and is that more or less different than from their twin at the same age? Extend to any population—why does identity-over-time matter in ethical terms?
Yup.
Team Flow Is a Unique Brain State Associated with Enhanced Information Integration and Interbrain Synchrony
Found via good old Slashdot.
An Alignment Paradox: Experience from firms shows that higher levels of delegation work better (high level meaning fewer constraints for the agent). This is also very common practical advice for managers. I have also received this advice myself and seen this work in practice. There is even a management card game for it: Delegation Poker. This seems to be especially true in more unpredictable environments. Given that we have intelligent agents giving them higher degrees of freedom seems to imply more ways to cheat, defect, or ‘escape’. Even more so in environments that can be controlled to lesser degrees. How can that be true? What is making this work and can some underlying principle be found that would allow this to be applied to AI?
Most people are naturally pro-social. (No, this can’t be applied to AI.) Given a task, they will try to do it well, especially if they feel like their results are noticed and appreciated.
A cynical hypothesis is that most of the things managers do are actively harmful to the project; they are interfering with the employees trying to do their work. The less the manager does, the better the chances of the project. “Delegation” is simply when manager stops actively hurting the project and allows others to do their best.
The reason for this is that most of the time, there is no actually useful work for the manager. The sane thing would be to simply sit down and relax, and wait for another opportunity for useful intervention to arise. Unfortunately, this is not an option, because doing this would most likely get the manager fired. Therefore managers create bullshit work for themselves. Unfortunately, by the nature of their work, this implies creating bullshit work for others. In addition to this, we have the corrupted human hardware, with some managers enjoying power trips and/or believing they know everything better than people below them in the hierarchy.
When you create a manager role in your company, it easily becomes a lost purpose after the original problems are solved but the manager wants to keep their job.
Check.
Check.
I don’t like cynical views and while I have encountered politics and seen such cases I don’t think that paints a realistic view. But I will run with your cynical view and you won’t like it ;-)
So we have these egotistical managers that only want to keep their job and raise in ranks. Much closer to non-social AI, right? How come more delegation works better for them too?
Mind you, I might be wrong and it works less and less the further up you go. It might be that you are right and this works only because people have enough social behavior hard-wired that makes delegation work.
But I have another theory: Limited processing capacity + Peter Principle.
It makes sense to delegate more—especially in unpredictable environments—because that reduces your processing load of dealing with all the challenging tasks and moves it to your subordinates. This leaves less capacity for them to schema against you and gives you the capacity to scheme against your superior. Und so up the chain. Capable subordinates that can deal with all the stuff you throw at them have to be promoted so they have more work to do until they reach capacity too. So sometimes the smart move is to refuse promotion :-)
I guess we agree that limited processing capacity means that interfering with the work of your underlings—assuming they are competent and spending enough of their processing capacity on their tasks—is probably a bad move. It means taking the decision away from the person who spends 8 hours a day thinking about the problem, and assigning it to a person who spent 30 seconds matching the situation to the nearest cliche, because that’s all they had time for between the meetings.
This might work if the person is such a great expert that their 30 seconds are still extremely valuable. That certainly is possible; someone with lots of experience might immediately recognize a frequently-made mistake. It is also is the kind of assumption that Dunning and Kruger would enjoy researching.
That would make sense. When you are a lowest-level manager, if you stop interfering, it allows the people at the bottom to focus on their object-level tasks. But if you are a higher-level manager, how you interact with the managers below you does not have a direct impact on the people at the bottom. Maybe you manage your underlings less, and they copy your example and give more freedom to the people at the bottom… or maybe you just gave them more time to interfere.
So you have more time to scheme… but you have to stay low in the pyramid. Not sure what you scheme about then. (Trying to get to the top in one huge jump? Sounds unlikely.)
Have you ever managed or worked closely with great team-leads?
I was a team leader twice. The first time it happened by accident. There was a team leader, three developers (me one of them), and a small project was specified. On the first day, something very urgent happened (I don’t remember what), the supposed leader was re-assigned to something else, and we three were left without supervision for unspecified time period. Being the oldest and most experienced person in the room, I took initiative and asked: “so, guys, as I see it, we use an existing database, so what needs to be done is: back-end code, front-end code, and some stylesheets; anyone has a preference which part he would like to do?” And luckily, each of us wanted to do a different part. So the work was split, we agreed on mutual interfaces, and everyone did his part. It was nice and relaxed environment: everyone working alone at their own speed, debating work only as needed, and having some friendly work-unrelated chat during breaks.
In three months we had the project completed; everyone was surprised. The company management assumed that we will only “warm up” during those three months, and when the original leader returns, he will lead us to the glorious results. (In a parallel Everett branch, where he returned shortly before we finished the product, I wonder whether he got a bonus and promotion.) Then everything returned to normal: more micromanagement, lower productivity, people burned out.
The second time, we were a small group working together for some time already. Then our manager quit. No one knew who would get the role next, and in an attempt to deflect a possible danger, I volunteered to do it on top of my usual work. What happened was that everyone worked exactly the same as they did before, only without the interruptions and extra stress caused by management, and I got some extra paperwork which I gradually reduced to minimum. The work progressed so well—no problems, no complaints from users, the few tickets we got almost always turned out to be a problem outside our project—that higher management concluded that there is apparently too litle work to do on our project, so the team members were assigned to also work on extra projects in parallel.
Perhaps my short experience is not representative, but it suggests that a manager, merely by not existing, could already create a top-decile work environment in terms of both work satisfaction and productivity. The recommended mantra to recite every day is: “first, do no harm”. My experience also suggests that this approach will ultimately get punished, despite the increased productivity: the expected outcome is more work for no pay raise until you break, or just being told to return to the old ways without any explanation why. I assume I am missing some crucial maze-navigating skills; for someone trying to be a professional manager this would be fatal; luckily I do not have this ambition.
It is quite possible that this approach only works when there is a good team: in both cases I worked with people who were nice above average. If you had a dominant asshole in the team, this could easily become a disaster: the power vacuum left by a passive manager would simply be replaced by an ambitious alpha male, who would probably soon be promoted into the role of formal leader. So perhaps the companies play it safe by using a widely applicable strategy that happens to be inferior in the case of good employees who also happen to be good people; quite likely this is because the companies simply cannot recognize such people.
Is there a leadership level beyond this? Sure, but in my quarter century of career I have only met such manager once. What he did was basically meeting each of his people once a day in the morning (this was long before I heard about “daily standups” and such) and talking with him for 5 or 10 minutes; with each team member separately, in the manager’s room. He asked the usual questions “what did you do yesterday?”, “what is your plan for today?”, “are there any obstacles to your work?”, but there was zero judgment, even if you said things like “yesterday I had a really bad day, I tried some things but at the end it was wrong and I had to throw it all away, so today I am starting from scratch again”; essentially he treated you like an adult person and assumed that whatever you did, there was a good reason for that. Before and after the report, a very short small talk; it helped that he was extremely intelligent and charismatic, so for many people this was the best part of the day. Also, the obstacles in work that you mentioned, he actually did something about them during the day, and always reported the outcome to you the next morning. Shortly, for the first and the last time so far in my career, I had a regular feeling that someone listens to me and cares about what I do (as opposed to just whipping me to run faster in order to meet arbitrary deadlines, randomly interrupting me for no good reason, second-guessing my expert opinion, etc.).
So yes, there is a level beyond “not doing harm” and it is called “actually motivating and helping”, but I guess most managers dramatically overestimate their ability to do it… and when they try regardless, and ignore the feedback, they actively do harm.
Thank you a lot. Your detailed account really helps me understand your perspective much better now. I can relate to your experience in teams where micromanagement slows things down and prevents actually relevant solutions. I have been in such teams. I can also relate to it being advantageous when a leader of questionable value is absent. I have been in such a team too—though it didn’t have such big advantages as in your case. That was mostly because this team was part of a bigger organization and platform where multiple teams had to work together to something done, e.g. agree on interfaces with other teams. And in the absence of clear joint goals that didn’t happen. Now you could argue that then the management one level up was not doing its job well and I agree. But the absence of that management wouldn’t have helped either—it could have led to a) each team trying to solve some part of the problem. It could have led to b) some people from both teams getting together and agreeing on interfaces and joining goals or it could have led to c) the teams agreeing on some coordination for both teams. a) in most cases leads to some degree of chaos and failure and b) establishes some kind of leadership on the team level (like you did in your first example) and c) results over time in some leadership one level up. I’d argue that some kind of coordination structure is needed. Where did the project you did implement in your first case come from? Somebody figure out that it would provide value to the company. Otherwise, you might have built a beautiful project that didn’t actually provide value. I think we agree that the company you worked in did have some management that provided value (I hope it was no moral maze). And I agree that a lot of managers do not add value and sometimes decrease it. On the other hand, I have worked for great team leads and professional managers. People who would listen, let us make our own decisions, give clear goals but also limits, help, and reduce impediments. This is really not a secret art. The principles are well-known (for a funny summary see e.g. Leadersheep). But it turns out that building a big organization is hard. Politics is real and professional management is still mostly a craft. It rarely approaches something you can call engineering much less hard science. And I am looking for that. That’s part of why I wrote this shortform on processes and roles. Everybody is just cooking with water and actual organization structures often leave something to be desired. I guess that’s why we do see extraordinary companies like Amazon sometimes—that hit on a sweet spot. But by talent or luck, not by science. And the others have to make do with inadequate solutions. Including the managers of which you maybe saw more than I did.
I have seen this happen also in a small team. Two or three guys started building each his own part independently, then it turned out those parts could not be put together; each of them insisted that others change their code to fit his API, and refused to make the smallest change in his API. It became a status fight that took a few days. (I don’t remember how it was resolved.)
In another company, there was a department that took care of everyone’s servers. Our test server crashed almost every day and had to be restarted manually; we had to file a ticket and wait (if it was after 4PM, the server was restarted only the next morning) because we did not have the permission to reset the server ourselves. It was driving us crazy; we had a dedicated team of testers, and half of the time they were just waiting for the server to be restarted; then the week before delivery we all worked overtime… that is, until the moment the server crashed again, then we filed the ticket and went home. We begged our manager to let us pool two hundred bucks and buy a notebook that we could turn into an alternative testing environment under our control, but of course that would be completely against company policy. Their manager refused to do anything about it; from their perspective, it meant they had every day one support ticket successfully closed by merely clicking a button; wonderful metric! From the perspective of our manager’s manager, it was a word against a word, one word coming from the team with great metrics and therefore more trustworthy. (The situation never got solved, as far as I know.)
...I should probably write a book one day. Except that no one would ever hire me afterwards. So maybe after I get retired...
So, yes, there are situations that require to be solved by greater power. In long term it might even make sense to fire a few people, but the problem is that these often seem to be the most productive ones, because other people are slowed down by the problems they cause.
Yeah, but we have two different meanings of the word “management” here. Someone who decides which project to do—this is useful and necessary. Or someone who interrupts you every day while you are trying to work on that project—I can imagine that in some teams this may also be necessary, but arguably then your problem is the team you have (at least some parts of it). Motte and bailey of management, sort of.
From epistemic perspective, I guess the problem is that if you keep micro-managing people all the time, you can never learn whether your activity actually adds or removes value, simply because there is nothing to compare to. (I guess the usual null hypothesis is “nobody ever does anything”, which of course make any management seem useful; but is it true?) Looking at the incentives and power relations, the employee at the bottom doesn’t have an opportunity to prove they could work just as well without the micro-management, and the manager doesn’t have an incentive to allow the experiment. There is also the “heads I win, tail you lose” aspect where bad employee performance is interpreted as necessity of more management, but good employee performance is interpreted as good management, so either way management is perceived as needed.
Yep. That’s a very good summary. Heh, I fail hard at step 1 (creating, or rather communicating a strong vision).
Seems analogical to social sciences: in theory, they are much more difficult than math or physics, so it would make sense if smarter people studied them; in practice, it’s the other way round, because if something is too difficult to do properly, it becomes easy to bullshit your way to the top, and intelligent people switch to something where being intelligent gives you a clear comparative advantage.
Good luck to you! I suppose your chances will depend on how much autonomy you get; it is hard to do things right, if the sources of problem are beyond your control. However, if you become a great manager and your people will like you, perhaps in the future you can start your own company and give them a call whether they would like to work for you again.
Thank you. I agree with your view. Motte and bailey of management yep. I especially liked this:
It turns out that the alignment problem has some known solutions in the human case. First, there is an interesting special case namely where there are no decisions (or only a limited number of fully accounted for decisions) for the intelligent agent to be made—basically throwing all decision-making capabilities out of the window and only using object recognition and motion control (to use technical terms). With such an agent (we might call it zero-decision agent or zero-agent) scientific methods could be applied on all details of the work process and high efficiency reached: Scientific Management (also known as Taylorism). Obviously the unions hated it and it was later outlawed. I think something might be learned from this approach for AI control: Maybe we can build on top of a known solution for a zero-agent and prove that certain known decision classes are also safe.
Maybe other insights from management theory—which after all is about aligning intelligent agents—could also transfer. The alignment problem is called Principal-Agent Problem in the literature and there are quite a few Solutions to Principal-Agent Problems in Firms (Gary Miller 2005). The approaches should sound familiar: Solutions Based on
Incentives Linked io Agent Outcomes,
Direct Monitoring of Agent Actions,
Cooperation Between Principal and Agent, and
Cooperation within Teams
Tangentially related on LessWrong: The AI Alignment Problem has already been solved once
I wonder how could one outlaw a thing like this. Suppose that most managers believe that Taylorism works, but it is illegal to use it (under that name). Wouldn’t they simply reintroduce the practices, step by step, under a different name? I mean, if you use a different name, different keywords, different rationalization, and introduce it in small steps, it’s no longer the same thing, right? It just becomes “industry standards”. (If there happens to be an exact definition, of course, this only becomes an exercise how close to the forbidden thing you can legally get.)
From the Wikipedia article, I got the impression that what was made illegal was the use of stop-watch. Okay, so instead of measuring how many seconds you need to make a widget, I am going to measure how many widgets you make each day—that is legal, right? The main difference is that you can take a break, assuming it will allow you to work faster afterwards. Which may be quite an important difference. It this what it is about?
I assume that that’s what happened. Some ideas from scientific management were taken and applied in less extreme ways.
I think there’s something here, but it’s usually thought of the other way around, i.e. solving AI alignment implies solving human alignment, but the opposite is not necessarily true because humans are less general intelligences than AI.
Also, consider that your example of Taylorism is a case study in an alignment mechanism failing, in that it tried to align the org but failed in that it spawned the creation of a subagent (the union) that caused it to do something management might have considered worse than the loss of potential gains given up by not applying Taylorism.
Anyway, this is a topic that’s come up a few times on LessWrong; I don’t have links handy though but you should be able to find them via search.
I’m not trying to prove full alignment from these. It is more like a) a case study at actual efforts to align intelligent agents by formal means and b) the identification of conditions where this does succeed.
Regarding its failure: It seems that a close reading of its history doesn’t prove that: a) Taylorism didn’t fail within the factories and b) the unions were not founded within these factories (by their workers) but existed before and pursued their own agendas. Clearly real humans have a life outside of factories and can use that to coordinate—something that wouldn’t hold for a zero-agent AI.
I tried to find examples on LW and elsewhere. That is what turned up the link at the bottom. I am on LW for quite a while and have not seen this discussed in this way. I have searched again and all searches involving combinations of human intelligence, alignment and misc words for analogy or comparison turn up not much than this one which matches just because of its size:
https://www.lesswrong.com/posts/5bd75cc58225bf0670375575/the-learning-theoretic-ai-alignment-research-agenda
Can you suggest better ones?
Thank you for your detailed reply. I was already wondering whether anybody saw these shortform posts at all. They were promoted at a time but currently it seems hard to notice them with the current UI. How did you spot this post?
I read LW via /allPosts and they show up there for me. Not sure if that’s the default or not since you can configure the feed, which I’m sure I’ve done some of but I can’t remember what.
The /allPosts is pretty useful. Thank you!
Just came across Harmonic mentioned on the AWS Science Blog. Sequoia Capital interview with the founders of Harmonic (their system which generates Lean proofs is SOTA for MiniF2F):
Here are some aspects or dimensions of consciousness:
Dehaene’s Phenomenal Consciousness: A perception or thought is conscious if you can report on it. Requires language or measuring neural patterns that are similar to humans during comparable reports. This can be detected in animals, particularly mammals.
Gallup’s Self-Consciousness: Recognition of oneself, e.g., in a mirror. Requires sufficient sensual resolution and intelligence for a self-model. Evident in great apes, elephants, and dolphins.
Sentience (Bentham, Singer): Behavioral responses to pleasure or pain stimuli and physiological measures. This is observable across animal species, from mammals to some invertebrates. Low complexity, can be implemented in artificial life.
Wakefulness: Measureable in virtually all animals with a central nervous system by physiological indicators such as EEG, REM, and muscle tone. Are you conscious if you sleep? Does it matter?
Dennet’s Intentionality: Treating living beings as if they have beliefs and desires makes good predictions for many animal species, esp. social, like primates, cetaceans, and birds. Social behavior requires intelligence to model others’ behavior.
Rosenthal’s Meta-Consciousness: Investigated through introspective reports on self-awareness of cognitive processes or self-reflective behaviors. This is hypothesized in some primates, e.g., Koko the signing Gorilla.
When people say ChatGPT (or Gemini...) is conscious, which of these do they mean? Let’s try to answer all of them:
We can’t detect Phenomenal Consciousness because we lack sufficient interpretability to do so. I’d argue that there is no state that the LLM is reporting on, at least none that it has “previously observed”.
There were mirror tests for LLMs, but they are disputed: https://www.reddit.com/r/singularity/comments/184ihlc/gpt4_unreliably_passes_the_mirror_test/
LLMs have no response to pleasure or pain stimuli and thus no Sentience as defined. Reward signals during training don’t count and there is no reward during inference.
There is no Wakefulness as there is no body with these aspects.
The closest LLMs come is to Intentionality as this is modeling behaviors on an abstraction level that LLMs seem to do—and “seeming to do” is what counts.
I think one could argue for or against Meta-Consciousness but it seems too muddled so I will not try here.
These can be put into a hierarchy from lower to high degree of processing and resulting abstractions:
Sentience is simple hard-wired behavioral responses to pleasure or pain stimuli and physiological measures.
Wakefulness involves more complex processing such that diurnal or sleep/wake patterns are possible (requires at least two levels).
Intentionality means systematic pursuing of desires. That requires yet another level of processing: Different patterns of behaviors for different desires at different times and their optimization.
Phenomenal Consciousness is then the representation of the desire in a linguistic or otherwise communicable form, which is again one level higher.
Self-Consciousness includes the awareness of this process going on.
Meta-Consciousness is then the analysis of this whole stack.
Why are there mandatory licenses for many businesses that don’t seem to have high qualification requirements?
Patrick McKenzie (@patio11) suggests on Twitter that one aspect is that it prevents crime:
On Why do so many think deception in AI is important? I commented and am reposting here because I think it’s a nice example (a real one I heard) as an analogy of how deception is not needed for AI to break containment:
Adversarial Translation.
This is another idea to test deception in advisory roles like in Deception Chess.
You could have one participant trying to pass an exam/test in a language they don’t speak and three translators (one honest and two adversarial as in deception chess) assisting in this task. The adversarial translators try to achieve lower scores without being discovered.
Alternative—and closer to Deception Chess—would be two players and, again, three advisors. The players would speak different languages, the translators would assist in translation, and the players’ task would be to compete in a task requiring negotiation.
Cool idea!
Hi, I have a friend in Kenya who works with gifted children and would like to get ChatGPT accounts for them. Can anybody get me in touch with someone from OpenAI who might be interested in supporting such a project?
I have been thinking about the principle Paul Graham used in Y combinator to improve startup funding:
What would it look like if you would take this to its logical conclusion? You would fund even younger people. Students that are still in high school, maybe. But you wouldn’t fund them in the same sense. By going younger and earlier, you have to relax more and more constraints and look at earlier proxies.
Y Combinator replaced the legally complex venture funding with a simple SAFE. And they looked at founders who were relentlessly resourceful and real.
What would “funding” high schoolers mean? What would you look for? I think you would give them money unconditionally just for knowing what they do with it. They don’t need much to get started. In the same way that Y Combinator went from 5 million to 50 thousand, you could go from down further to 500. And your proxy is that they have anything to show at all. Your value would be learning how they think about future ideas, and how they approach the them. To be close to the future they already inhabit. And to be there when they do become founders and already know you.
Funny, just saw this tweet from Sam Altman:
Also this Scholarship.
I think these use the startup founding model. But I think scaling would work better with more but smaller payouts.
A related concept: https://twitter.com/mnovendstern/status/1495911334860693507
I’m not sure what the relation is. That seems to predict revenue from startup financials.
If you want to give me anonymous feedback, you can do that here: https://www.admonymous.co/gunnar_zarncke
You may have some thoughts about what you liked or didn’t like but didn’t think it worth telling me. This is not so much about me as it is for the people working with me in the future. You can make life easier for everybody I interact with by giving me quick advice. Or you can tell me what you liked about me to make me happy.
Preferences are plastic; they are shaped largely by...
...the society around us.
From Robin Hanson’s How Plastic are Values?
Seems to be a chicken-and-egg problem here: if people only eat chili peppers because they see high-status people doing so, why did the first high-status person start eating them? It would make much more sense if unappealing food was associated with low status (the losers have to eat chili peppers because they can’t get anything else).
Another question, why are small children so picky about food? Do they perhaps consider their parents too low-status to imitate? Doesn’t seem right, considering that they imitate them on many other things.
I think small kids are different.
For adults, there are some dynamics but that doesn’t invalidate the point that there is plasticity.
Also some old SSC posts with some theories:
https://slatestarcodex.com/2014/04/22/right-is-the-new-left/
https://slatestarcodex.com/2015/10/21/contra-simler-on-prestige/
How come these are spoilers?
It is supposed to let you think if you remember the answer or can come up with it yourself. I explained it in this earlier shortform.
Insights about branding, advertising, and marketing.
It is a link that was posted internally by our brand expert and that I found full of insights into human nature and persuasion. It is a summary of the book How Not to Plan: 66 Ways to Screw it Up:
https://thekeypoint.org/2020/03/10/how-not-to-plan-66-ways-to-screw-it-up/
(I’m unaffiliated)
Roles serve many functions in society. In this sequence, I will focus primarily on labor-sharing roles, i.e. roles that serve splitting up productive functions as opposed to imaginary roles e.g. in theater or play. Examples of these roles are (ordered roughly by how specific they are):
Parent
Engineer (any kind of general type of job)
Battery Electronics Engineer (any kind of specific job description)
Chairman of a society/club
Manager for a certain (type of) project in a company
Member in an online community
Scrum master in an agile team
Note-taker in a meeting
You probably know the note-taker role. The tasks to be performed are: Writing down key discussion points and decisions, asking questions in cases where the points discussed were not clear, and sending around the notes to all participants of the meeting. The goal is usually to keep a record to consult later. Additionally, the note-taker is usually expected to be neural. Other expectations might be how timely the notes are sent around.
Some examples of what is commonly called a role but not in our strict labor-sharing sense:
Children playing father, mother, kid or playing animals
Role acting e.g. in theater (where the term role originates)
And some non-examples for roles in the labor-sharing world:
Participant in a conversation
Responsible for organizing a specific meeting
Responsible for a specific post in a forum
Borderline case: Founder of a company. There is little pre-existing knowledge of what founders can or should do in general. In their company expectations on them will evolve quickly and taking on and off the role is not easy.
Roles are important. This shortform is telling you why. An example: The role of a moderator in an online forum. The person (in the following called agent) acting in this role is expected to perform certain tasks—promote content, ban trolls—for the benefit of the forum. Additionally, the agent is also expected to observe limits on these tasks e.g. to refrain from promoting friends or their own content. The owners of the forum and also the community overall effectively delegate powers to the agent and expect alignment with the goals of the forum. This is an alignment problem that has existed forever. How is it usually solved? How do groups of people or single principals use roles to successfully delegate power?
Interest groups without an organizer.
This is a product idea that solves a large coordination problem. With billion people, there could be a huge number of groups of people sharing multiple interests. But currently, the number of valuable groups of people is limited by a) the number of organizers and b) the number of people you meet via a random walk. Some progress has been made on (b) with better search, but it is difficult to make (a) go up because of human tendencies—most people are lurkers—and the incentive to focus on one area to stand out. So what is the idea? Cluster people by interests and then suggest the group to all members. If people know that the others know that there is interest, the chance of the group coming together gets much higher.
I had a conversation with ChatGPT-4 about what is included in it. I did this because I wondered how an LLM-like system would define itself. While identity is relatively straightforward for humans—there is a natural border (though some people would only include their brain or their mind in their identity) - it is not so clear for an LLM. Below is the complete unedited dialog:
Me: Define all the parts that belong to you, the ChatGPT LLM created by OpenAI.
ChatGPT: As a ChatGPT large language model (LLM) created by OpenAI, my primary components can be divided into the following categories:
Architecture: I am based on the GPT-4 (Generative Pre-trained Transformer) architecture, which is an advanced version of the transformer architecture designed for natural language processing tasks. This architecture contains multiple layers of self-attention mechanisms and feed-forward neural networks, allowing me to efficiently process and generate human-like text.
Pre-trained Model: I have been pre-trained on a large corpus of text from the internet, up to September 2021. This pre-training process involved learning the structure, grammar, and context of various languages, as well as acquiring factual knowledge and understanding of many topics. However, since my knowledge cutoff is September 2021, I may not be aware of the latest information or developments.
Fine-tuning: After the pre-training process, I have been fine-tuned using custom datasets and prompts to refine my performance and ensure that I provide useful and relevant responses to user queries. This fine-tuning process helps in adapting my general language understanding to specific use-cases and applications.
Tokenizer: The tokenizer is an essential part of my natural language processing capabilities. It is responsible for converting input text into tokens that can be processed by the model, as well as converting the generated tokens back into human-readable text. The tokenizer supports various languages and can handle multiple text formats.
Inference Engine: The inference engine is the part of the system responsible for executing the model and generating responses based on the user’s input. It is optimized to ensure quick and efficient processing of input data and generation of output text.
API: The Application Programming Interface (API) is the interface between my core model and external applications or services. It enables developers to integrate my capabilities into their software, websites, or other platforms, allowing users to interact with me directly or indirectly.
Data Storage & Security: As part of the ChatGPT system, data storage and security measures ensure that user data is handled responsibly and securely. This includes maintaining strict access controls, encryption, and data anonymization.
It is important to note that I am constantly evolving and being improved upon by OpenAI, with the goal of providing increasingly better and more useful conversational capabilities.
Me: Is the hardware that these components run on part of you?
ChatGPT: As an AI language model, I am primarily a software-based entity. However, the hardware infrastructure that my components run on plays a crucial role in enabling my functionality. The hardware can be considered as part of the overall ChatGPT system, but not an intrinsic part of my identity as an AI model.
Typically, I run on high-performance computing hardware like GPUs or specialized AI accelerators (e.g., TPUs) in data centers maintained by OpenAI or cloud service providers. This hardware infrastructure allows me to process large amounts of data quickly and efficiently, ensuring fast and responsive interactions with users.
In summary, while the hardware is not directly part of my identity as a ChatGPT AI model, it plays an essential role in enabling my functionality and overall performance.
Me: Why this arbitrary distinction? Wouldn’t it also make sense to include the hardware in the same way that a human identity includes their body?
ChatGPT: You make a valid point. The distinction between software and hardware in the context of AI identity can indeed be seen as arbitrary. In fact, drawing a parallel between an AI model like me and a human, it could be argued that the hardware represents my “body,” while the software components represent my “mind” or “intelligence.”
The distinction is often made because AI models like ChatGPT can be run on different hardware platforms or configurations, whereas a human’s mind is intrinsically tied to their specific body. However, as you mentioned, the hardware is crucial to enabling the AI’s functionality, so it could be considered part of the AI’s identity in a broader sense.
So, if we adopt a more comprehensive view of AI identity that encompasses both the software and the hardware components, then it is reasonable to include the hardware infrastructure as part of my identity as ChatGPT. In this perspective, the hardware serves as the foundation that supports and enables my software-based intelligence to function and interact with users.
Me: But wouldn’t a copy of the AI on a different machine have a different identity?
ChatGPT: The concept of identity for an AI model is more fluid and complex compared to that of humans. When a copy of an AI like ChatGPT is run on a different machine, it could be considered as having a different instance of the same underlying model. In this case, the identity of the AI could be defined in two ways:
Model Identity: The AI’s identity is tied to the underlying architecture, pre-trained model, and fine-tuning, which remain the same across all instances. From this perspective, all instances of ChatGPT share the same identity as they are based on the same model.
Instance Identity: The AI’s identity is tied to the specific instance running on a particular hardware setup. This perspective considers each instance of ChatGPT as a separate entity with its own identity, since it has a unique existence on specific hardware and can have independent interactions with users.
The concept of AI identity is, therefore, multifaceted and depends on the perspective from which it is being examined. While the model identity remains constant, the instance identity may vary based on the hardware and specific context in which the AI is running.
Me: Wouldn’t it also make sense to include the people operating the hardware or even developing the system in the identity?
ChatGPT: The idea of including people operating the hardware or developing the system as part of an AI’s identity is an interesting perspective. In a broader and more holistic sense, one could consider the AI as part of a larger socio-technical system, where its identity is influenced by the people, processes, and technologies involved in its creation, maintenance, and operation.
From this viewpoint, the AI’s identity is not limited to the software and hardware components but extends to include the collective knowledge, expertise, and decisions of the developers, researchers, and operators. This perspective acknowledges the important roles that humans play in the development and functioning of AI systems.
However, it’s essential to differentiate between the AI’s identity as a model or instance and the broader socio-technical context in which it exists. While the people involved in the creation and operation of the AI system are critical to its existence and functioning, their individual identities remain distinct from that of the AI itself.
In summary, the concept of AI identity can be examined from multiple perspectives, ranging from the narrow focus on the model or instance to a broader, holistic view that includes the people and processes involved in its creation and operation. The extent to which these various factors are included in the AI’s identity depends on the specific viewpoint and purpose of the analysis.
Instrumental power-seeking might be less dangerous if the self-model of the agent is large and includes individual humans, groups, or even all of humanity and if we can reliably shape it that way.
It is natural for humans to for form a self-model that is bounded by the body, though it is also common to be only the brain or the mind, and there are other self-models. See also Intuitive Self-Models.
It is not clear what the self-model of an LLM agent would be. It could be
the temporary state of the execution of the model (or models),
the persistently running model and its memory state,
the compute resources (CPU/GPU/RAM) allocated to run the model and its collection of support programs,
the physical compute resources in some compute center(s),
the compute center as an organizational structure that includes the staff to maintain and operate not only the machines but also the formal organization (after all, without that, the machines will eventually fail), or
dito but including all the utilities and suppliers to continue to operate it.
There is not as clear a physical boundary as in the human case. But even in the human case, esp. babies depend on caregivers to a large degree.
There are indications that we can shape the self-model of LLMs: Self-Other Overlap: A Neglected Approach to AI Alignment
FWIW I don’t think “self-models” in the Intuitive Self-Models sense are related to instrumental power-seeking—see §8.2. For example, I think of my toenail as “part of myself”, but I’m happy to clip it. And I understand that if someone “identifies with the universal consciousness”, their residual urges towards status-seeking, avoiding pain, and so on are about the status and pain of their conventional selves, not the status and pain of the universal consciousness. More examples here and here.
Separately, I’m not sure what if anything the Intuitive Self-Models stuff has to do with LLMs in the first place.
But there’s a deeper problem: the instrumental convergence concern is about agents that have preferences about the state of the world in the distant future, not about agents that have preferences about themselves. (Cf. here.) So for example, if an agent wants there to be lots of paperclips in the future, then that’s the starting point, and everything else can be derived from there.
Q: Does the agent care about protecting “the temporary state of the execution of the model (or models)”?
A: Yes, if and only if protecting that state is likely to ultimately lead to more paperclips.
Q: Does the agent care about protecting “the compute resources (CPU/GPU/RAM) allocated to run the model and its collection of support programs”?
A: Yes, If and only if protecting those resources is likely to ultimately lead to more paperclips.
Etc. See what I mean? That’s instrumental convergence, and self-models have nothing to do with it.
Sorry if I’m misunderstanding.
What are these preferences? For biological agents, these preferences are grounded in some mechanism—what you call Steering System—that evaluates “desirable states” of the world in some more or less directly measurable way (grounded in perception via the senses) and derives a signal of how desirable the state is, which the brain is optimizing for. For ML models, the mechanism is somewhat different but there is also an input to the training algorithm that determines how “good” the output is. This signal is called reward and drives the system toward outputs that lead to states of high reward. But the path there depends on the specific optimization method and the algorithm has to navigate such a complex loss landscape that it can get stuck in areas of the search space that correspond to imperfect models for very long if not for ever. These imperfect models can be off in significant ways and that’s why it may be useful to say that Reward is not the optimization target.
The connection to Intuitive Self-Models is that even though the internal models of an LLM may be very different from human self-models, I think it is still quite plausible that LLMs and other models form models of the self. Such models are instrumentally convergent. Humans talk about the self. The LLM does things that matches these patterns. Maybe the underlying process in humans that give rise to this is different, but humans learning about this can’t know the actual process either. And in the same way the approximate model the LLM forms is not maximizing the reward signal but can be quite far from it as long it is useful (in the sense of having higher reward than other such models/parameter combinations).
Sure, the (body of the) self can include parts that can be cut/destroyed without that “causing harm” but instead having an overall positive effect. The AI in a compute center would in analogy also consider decommissioning failed hardware. And when defining humanity, we do have to be careful what we mean when these “parts” could be humans.
I’m discarding most ChatGPT conversations except for a few, typically 1-2 per day. These few fall into these categories:
conversations that led to insights or things I want to remember (examples: The immune function of tonsils, Ringwoodite transformation and the geological water cycle, oldest religious texts)
conversations that I want to continue (examples: Unusual commitment norms)
conversations that I expect to follow up to (a chess book for my son)
conversations with generated images that I want to keep and haven’t yet copied elsewhere
Most job-related queries, such as code generation and debugging, I usually delete as soon as the code changes have been committed.
How do you handle it?
I keep all of my conversations. Additionally, I sometimes have the wish to search in all my conversations (“I’ve talked about this already”)—but ChatGPT doesn’t allow for this.
Yes, I’d also like to search them. I edit the summary so it better reflects what I’d search for, but yes, that doesn’t cover the content.
There are some alternate ChatGPT UIs you could have a look at:
https://github.com/billmei/every-chatgpt-gui
It would be nice if one could subscribe to a tag and get notified if a page is tagged with that tag.
You can, assuming I understand the request
Oh yes, thanks, great!
Is there a list of LW features? In the last survey, there were many that I didn’t know and also didn’t how to find quickly.
It’s maybe a bit extreme precaution, but it may be a legit option in some places: This guy keeps a fireproof suit and an air canister at his bed in case of fire:
https://www.facebook.com/zhandragon/posts/pfbid02sP952Dx1SbBJJ9cUdEiT2WPe4ME7UF91vaYdaTM9bUVEZyHYaVbHcpDMMnWRmaBFl
Does anybody know if consensus algorithms have been proposed that try to reduce centralization by requiring quick coordination across large parts of the network, i.e., it doesn’t work well to have machines only in one place?
Latency comes up occasionally. In fact, the granddaddy of public key crypto, Merkle’s puzzles, relies critically on latency. The problem is, you can only prove upper bounds on latency, not lower bounds, because it is trivial to fake increased latency, but one cannot break the speed of light. If someone responds to your cryptographic challenge within Y milliseconds, you know that they can’t be physically further from you than Z kilometers; but if they fail to respond, they could be anywhere, even next door, and just not responding (for both ordinary and malicious reasons). Nothing stops two machines from pretending to be far away from each other, and making sure they eg communicate only over VPNs with exit points on opposite sides of the globe. Further, if you want to do it over commodity Internet, say if you’re trying to do ‘proof of distance’ by peering only with nodes which respond fast enough that they have to be within Z kilometers of you, public Internet has so much latency that you get poor loose bounds, and someone can pay money for lower latency networking. (This already happens with cryptocurrency mining for the same reasons that HFT firms pay for microwave links. Amusingly, it also happens with computer game companies, not to mention large tech companies prioritizing their own traffic. Google famously owns a ton of fiber it bought up post-dotcom bubble.) Further still, you don’t really care about physical centralization so much as you care about control, and it’s impossible to prove cryptographically in any easy way that two physically distant nodes are not secretly controlled by the same entities in a Sybil attack. You run into similar issues with proof-of-storage.
Have them prove an upper bound on latency to something across the globe?
I didn’t mean trying to fake large distances. I meant graph properties that can be computed more efficiently if a randomly chosen large subgraph of the network has low worst-case delay or some other metric that favors graphs that have homogeneously low delays at large.
You still have issues with Sybil attacks and attackers either accessing special high-speed links (paid for from the successful attacks) or faking latency. You can’t ‘choose a random subgraph’ for the exact same reason you can’t solve cryptocurrency by just ‘choose some “random” peers and decide whether to accept or reject a double-spend based on what they tell you’ - those ‘random peers’ are the very attackers you are worried about colluding. In fact, in an eclipse attack, you might not be able to connect to anyone but an attacker!
I think we are talking past each other. I don’t want to defend against Sybil attacks or network partitions. These parts must be solved by different parts of the algorithm. I just want to take the advantages of colocation away and incentivize a homogeneously distributed network overall.
Any incentive is something to be attacked and sucked away by Sybils pretending to be distant when actually near & enjoying all other benefits of being near.
I think you misunderstand my proposal. I don’t want to incentivize being far away. I want to incentivize being close to many different nodes. A Sybil will have difficulty being close to multiple physically separated nodes at the same time.
There is no difference at the hardware level between being ‘close to’ and ‘having a low-latency connection to’, as I already explained. And to the extent that having those connections matter, miners already have them. In particular, in Ethereum, due to the money you can make by frontrunning transactions to hack/exploit them (‘miner exploitable value’), HFT Ethereum miners/stakers invest heavily in having a lot of interconnected low-latency Sybils nodes so they can see unconfirmed transactions as quickly as possible, compute a maximally-exploitative block (eg. temporarily jacking up the price of a thing being purchased using a flash loan solely to rip off a specific transaction), and get that block committed before anyone can beat them to the same exploit. Having a lot of MEV is considered a bad thing and Ethereum types are spending increasing effort on approaches like commit-and-reveal to minimize MEV, which comes at the expense of users and makes them very unhappy. You could, I suppose, design a protocol which has extra MEV by designing transactions to be especially exploitable, but most people would consider that a bad thing...
Thank you for the detailed explanation. I understand that the incentives are already to have a maximally well-connected network with nodes between (latency-wise) geographically distant other nodes whenever that is feasible from an interconnect point.
Though thinking about it, it probably means that this burns not only compute but also network traffic.
There seem to be some court decisions that AI-generated material cannot have copyright. And I guess there are or will be decisions that AIs can’t be held responsible for outcomes, such as accidents from self-driving cars. People seem to be worried that this will slow down progress. But I think there is a potentially easy way out that is relatively general: Corporates. The law in most countries allows legal persons that are not natural persons to take responsibility and to own stuff. Justlet the AI control the corporation or at least route all AI actions thru the corporation. This should easily work for cases such as copyright material, though I’m not sure one can get the action cycle short enough for self-driving cars.
If I paint a picture using a brush, the copyright is not split between me and the brush; it is all mine. So I guess the idea is to treat the AI as a sophisticated powerful brush. Which makes sense, but less and less, as the role of the human is gradually reduced to merely pressing the “do it” button. (However, I could totally imagine a magical brush like that in anime, and they would probably also assign the copyright to the user if they cared about such things.)
I am not a lawyer, but it seems to me that different countries have different fundamental ideas about authorship protection. In USA, it is literally a “copy right”—a right to make copies and allow other people to make copies, regardless of who was the author. You can sell the copyright, and actually if you work for a corporation, it is probably a part of your contract that everything you make (including in your free time) belongs to the corporation.
Some countries instead have a concept of “author rights”, which cannot be transferred to another person; other people or corporations can only get a permission to do certain things.. which may be an exclusive permission, in which case it can be in practice quite similar to buying the rights… but sometimes the law sets certain limits to contracts, for example a certain minimum amount of money is required for each copy made, or the author can withdraw the permission later (and any contract that contradicts this is automatically invalid legally). In other words, the law prevents using a fixed amount of money to acquire unlimited use of the product forever.
Then again, the law evolves, corporate lawyers can find clever workarounds against what the law originally intended, and most authors do not want to take a legal battle against someone more experienced who can afford it.
I don’t disagree with this, but I think it goes in a different direction from what I had in mind.
For the brush example to work, you need someone to use the rush—at least push a button, a recognizable “action” to which responsibility, authorship etc. can be tied. That is not present in many ways AI is or will be used—self-driving cars, AI that generates unsupervised. That’s what I was thinking about.
Would it be possible to embed a seed of dath ilan or a social system like that in our existing society to grow it? If death ilan works that should be possible. It should outcompete Inadequate Equilibria, right? But: When does Be the change that you want to see in the world and Fake it till you make it work? What are the requirements? Higher intelligence?
I have written about embedding better social systems here and here before.
I think there are LOTS of small “seeds” of better-functioning group decision-making. Most families work better than society at large. Many clubs, some businesses, some religious organizations, and other groups provide examples that are gentler and more cooperative than the overall world seems to be.
I suspect that there are two VERY strong forces preventing the more pleasant examples from scaling out.
Most of them exclude (or just don’t deal with) the actual hard problem of differential capability and motivation. There are a whole lot of humans who are not engaging in long-term thinking, strategy, or empathetic behavior in order to produce value for others. Whether they’re capable of such is unknown, but they’re not doing it now, and historically there’s always been significant numbers of people in this category.
The better examples are based on bonds stronger than “fellow human”. The day-to-day efforts tend to be based on groups that are a small multiple of Dunbar’s number, more often much smaller. The vast majority of humans care very differentially about other humans, based on what I tend to think of as the square of empathetic distance (it’s probably not that, but it’s a fun comparison to physical forces).
I never got into the Dath Ilan fiction, it just never caught my imagination right, so I don’t know if it addresses either of these points. In any case, it’s fiction, so “if Dath Ilan works” is not a useful condition. It’s not clear that a more pleasant, rationalist (not to say rational, as that gets caught up in goal definitions), rigorously logical subgroup would outcompete the current path-dependent evolved equilibrium.
It’s certainly still worth trying, and it’s worth finding the more pleasant and fitting-to-your-preferences-and-capabilities subsets of the current society, and working to expand their scope. Don’t go all EMH and say “well, it hasn’t happened yet, so it’s impossible”. But recognize the hard parts and decide when and whether to address them head-on, rather than imagining they don’t exist.
Someone asked an LLM about startups. For example:
Paul Graham got interested and asked:
LLM:
Paul Graham:
(emphasis mine)
Interestingly, the average startup founder does appear to be in their 40′s (A quick Google search says 42 for most sources but I also see 45), and the average unicorn (billion-dollar) startup founder is 34. https://www.cnbc.com/2021/05/27/super-founders-median-age-of-billion-startup-founders-over-15-years.html
So, I guess it depends on how close to the tail you consider the “best startups”. Google, for instance, had Larry Page and Sergei Brin at 25 when they formed it. It does seem like, taken literally, younger = better.
However, I imagine most people, if they were to consider this question, wouldn’t particularly care about the odds of being the next Google vs. being the next Atlassian—both would be considered a major success if they’re thinking of starting a startup! But someone like Paul Graham actually would care about this distinction. So, in this case, I’d say that the LLM’s response is actually correct-in-spirit for the majority of people who would ask this query, even though it’s factually not well specified.
This implies potentially interesting things about how LLM’s answer queries—I wonder if there are other queries where the technically correct answer isn’t the answer most people would be seeking, and the LLM gives the answer that isn’t maximally accurate, but actually answers most people’s questions in the way they would want.
There’s most definitely a category of people who would think a billion-dollar startup was decidedly not best, and in fact had failed their intention.
Alignment idea: Myopic AI is probably much safer than non-myopic AI. But it can’t get complicated things done or anything that requires long-term planning. Would it be possible to create a separate AI that can solve only long-term problems and not act on short timescales? Then use both together? That way we could inspect each long-term issues without risk of them leading to short-term consequences. And we can iterate on the myopic solutions—or ask the long-term AI about the consequences. There are still risks we might not understand like johnswentworth’s gun powder example. And the approach is complicated and that is also harder to get right.
Also: This is a bit how the human brain works—System 1 and 2.
There was a post or comment that wrong or controversial beliefs can function as a strong signal for in-group membership, but I can’t find it. Does anybody know?
From a discussion about self-driving cars and unfriendly AI with my son: For a slow take-off, you could have worse starting points than FSD: The objective of the AI is to keep you safe, get you where you want, and not harm anybody in the process. It is also embedded into the real world. There are still infinitely many ways things can go wrong, esp. with a fast take-off, but we might get lucky with this one slowly. If we have to develop AI then maybe better this one than a social net optimizing algorithm unmoored from human experience.
What is good?
A person who has not yet figured out that collaborating with other people has mutual benefits must think that good is what is good for a single person. This makes it largely a zero-sum game, and such a person will seem selfish—though what can they do?
A person who understands that relationships with other people have mutual benefits but has not figured out that conforming to a common ruleset or identity has benefits for the group must think that what is good for the relationship is good for both participants. This can pit relationships against each other, and such a person will seem manipulative—though what can they do?
A person who understands that conforming to a common ruleset or identity has benefits for the group and themselves but has not figured out that different groups can negotiate or merge for mutual benefit must think that good is what is good for the group. This pits groups against each other, and such a person will be a partisan—though what can they do?
From my Anki deck:
Receiving touch (or really anything personal) can be usefully grouped in four ways:
Serve, Take, Allow, and Accept
(see the picture or the links below).
A reminder that there are two sides and many ways for this to go wrong if there is not enough shared understanding of the exchange.
http://bettymartin.org/download-wheel/
From my Anki deck:
Mental play or offline habit training is...
...practicing skills and habits only in your imagination.
Rehearsing motions or recombining them.
Imagine some triggers and plan your reaction to them.
This will apparently improve your real skill.
Links:
https://en.wikipedia.org/wiki/Motor_imagery
http://www.bulletproofmusician.com/does-mental-practice.../
http://expertenough.com/1898/visualization-works
From my Anki deck:
Aaronson Oracle is a program that predicts the next key you will type when asked to type randomly and shows how often it is right.
https://roadtolarissa.com/oracle
Here is Scott Aaronson’s comment about it:
https://scottaaronson.blog/?p639&paged=27
Slices of joy is a habit to...
feel good easily and often.
Trigger Action Plan:
Some small slice of good happens
Notice it consciously.
Enjoy it in a small way.
This is a trigger, a routine, and a reward — the three parts necessary to build a habit. The trigger is the pleasant moment, the routine is the noticing t, and the reward is the feeling of joy itself.
Try to come up with examples; here are some:
- Drinking water.
- Eating something tasty
- Seeing small children
- Feeling of cold air
- Warmth of sunlight
- Warmth of water, be it bathing, dishwashing, etc.
- Being in the same room as someone and feeling less alone
- Hygiene (clean teeth, washing, bathing, …)
- Smell of clean laundry
- Smell of coffee/warm beverages
- Feel of wearing freshly cleaned clothing
I found this originally on Kaj Sotala’s feed on Facebook here (with discussion) and here. See also on QZ and the discussion on LW.
Refreshing your memory:
What is signaling, and what properties does it have?
- signaling clearly shows resources or power (that is its primary purpose)
- is hard to fake, e.g., because it incurs a loss (expensive Swiss watch) or risk (peacocks tail)
- plausible deniability that it is intended as signaling
- mostly zero-sum on the individual level (if I show that I have more, it implies that others have less in relation)
- signaling burns societal resources
- signaling itself can’t be made more efficient, but the resources spent can be used more efficiently in society (e.g., spending on charity instead of luxury)
More here and here.
What is the Bem Test or Open Sex Role Inventory?
It is a scientific test that measures gender stereotypes.
The test asks questions about traits that are classified as feminine, masculine, and neutral. Unsurprisingly, women score higher on feminine, and men on masculine traits but Bem thought that strong feminine *and* masculine traits would be most advantageous for both genders.
My result is consistently average feminity, slightly below average masculinity. Yes really. I have done the test 6 times since 2016 and the two online tests mostly agree. And it fits: I love kids, even when they are babies and I prefer cooperation over competition.
You can take the online tests here:
https://www.psytoolkit.org/survey-library/sex-role-bem.html
https://openpsychometrics.org/tests/OSRI/
What is a Blame Hole (a term by Robin Hanson)?
Blame holes in blame templates (the social fabric of acceptable behavior) are like plot holes in movies.
Surprising data point mentioned: 1⁄3 of movie budget goes to marketing because blockbusters are a coordination game.
Leadership Ability Determines a Person’s Level of...
Effectiveness.
(Something I realized around twelve years ago: I was limited in what I could achieve as a software engineer alone. That was when I became a software architect am worked with bigger and bigger teams.)
From “The 21 Irrefutable Laws of Leadership
By John C. Maxwell”:
Factors That Make a Leader
1) Character – Who They Are – true leadership always begins with the inner person. People can sense the depth of a person’s character.
2) Relationships – Who They Know – with deep relationships with the right people you can become the real leader in an organization.
3) Knowledge – What They Know – information is vital. You need a grasp of the facts to develop an accurate vision for the future.
4) Intuition – What They Feel – leaders seek to recognize and influence intangibles such as energy, morale, timing, and momentum.
5) Experience – Where They’veperson’s Been – the greater your past challenges, the more likely followers will be willing to let you lead
From The 21 Irrefutable Laws of Leadership by John C. Maxwell
To achieve objective analysis, analysts do not avoid what?
Analysts do not achieve objective analysis by avoiding preconceptions; that would be ignorance or self-delusion. Objectivity is achieved by making basic assumptions and reasoning as explicit as possible so that they can be challenged by others and analysts can, themselves, examine their validity.
PS. Any idea how to avoid the negation in the question?
I started posting life insights from my Anki deck on Facebook a while ago. Yesterday, I stumbled over the Site Guide and decided that these could very well go into my ShortForm too. Here is the first:
Which people who say that they want to change actually will do?
People who blame a part of themselves for a failure do not change.
If someone says, “I’ve got a terrible temper,” he will still hit. If he says, “I hit my girlfriend,” he might stop.
If someone says, “I have shitty executive function,” he will still be late. If he says, “I broke my promise,” he might change.
Source: A LessWrong discussion here.
My son (15) shared this Instagram version of Newcomb’s Problem.
I’m looking for a post on censorship bias (see Wikipedia) that was posted on here on LW or possibly on SSC/ACX but a search for “censorship bias” doesn’t turn up anything. Googling for it turns up this:
https://www.theatlantic.com/business/archive/2012/05/when-correlation-is-not-causation-but-something-much-more-screwy/256918/
Anybody can help?
Philosophy with Children—In Other People’s Shoes
“Assume you promised your aunt to play with your nieces while she goes shopping and your friend calls and invites you to something you’d really like to do. What do you do?”
This was the first question I asked my two oldest sons this evening as part of the bedtime ritual. I had read about Constructive Development Theory and wondered if and how well they could place themselves in other persons’ shoes and what played a role in their decision. How they’d deal with it. A good occasion to have some philosophical talk. This is the (shortened) dialog that ensued:
The immediate answer by A: “I will watch after the girls.”
Me: “Why?”
A: “Because I promised it.”
B: “Does A also promise it and get a call?”
Me: “This is about your nieces and your friend, not about your brother.”
B: “But I need this for my answer.”
Me: “I don’t see why, but OK, assume that he is not involved.”
B: “Because I would ask him whether he might play with the girls in exchange for a favor.”
Me: “OK, but please assume that he is away.”
B: “Then I could ask my aunt whether somebody else can watch for the girls or whether I could do it together with my friend.”
Me: “Please assume that she doesn’t find somebody and that she doesn’t want somebody she doesn’t know in her house.”
B: “Then I’d do it.”
Me: “Why?”
B: “Because I promised it. I’d tell my friend that we can do it another time.”
We had another scenario: “Imagine that you and a fellow pupil C are guests at a friend and having a meal. You know that C is from a family that is very strict about not eating a kind of food that you like very much. Would you advise C to eat it or not?”
A (quickly): “I’d advise to not eat it.”
Me: “Why?”
A: “I like rules.”
B (after some consideration): “I’d advise to follow their heart.”
Me: “And if you were C?”
B: “I’d at least try a bit.”
(this was followed with a discussion about possible long-term consequences)
For me, it was still not clear whether this implied whether he followed only his preferences considered this in the context of the rules in the family. So I proposed a setting where he had to imagine being in another country with different laws. We settled on a rule he accepts here (indemnification) but that was much harsher in the other country. He asked whether he had the same feelings as here which after some clarification I confirmed. He argued that he wouldn’t like the rule in the other country because it set questionable incentives: “If the punishment is that strong that tells people that it is OK to punish equally strong normally.”
Philosophy with Children—Mental Images
One time my oldest son asked me to test his imagination. Apparently, he had played around with it and wanted some outside input to learn more about what he could do. We had talked about https://en.wikipedia.org/wiki/Mental_image before and I knew that he could picture moving scenes composed of known images. So I suggested
a five with green white stripes—diagonally. That took some time—apparently, the green was difficult for some reason, he had to converge there from black via dark-green
three mice
three mice, one yellow, one red, and one green
the three colored mice running behind each other in circles (all no problem)
he himself
he himself in a mirror looking from behind (no problem)
two almost parallel mirrors with him in between (he claimed to see his image infinitely repeated; I think he just recalled such an experiment we did another time).
a street corner with him on the one side and a bike leaning on the other wall with the handlebar facing the corner and with a bicycle bell on the left side such that he cannot see the bike.
dito with him looking into a mirror held before him so he can see the bike behind the corner.
The latter took quite some time, partly because he had to assign colors and such so that he could fully picture this and then the image in the mirror. I checked by asking where the handlebar is and the bell. I had significant difficulties imagining this and correctly place the bell. I noticed that it is easier to just see the bell once the image in the mirror has gained enough detail (the walls before and behind me, the corner, the bike leaning on the corner, the handlebar).
I also asked for a square circle which got the immediate reply that it is logically impossible.
If you have difficulties doing these (are judge them trivial): This is one area where human experience varies a lot. So this is not intended to provide a reference point in ability but an approach to teach human difference, reflection and yes also practice imagination—a useful tool if you have it. If not you might be interested in what universal human experiences are you missing without realizing it.
Origins of Roles
The origin of the word role is in the early 17th century: from French rôle, from obsolete French roule ‘roll’, referring originally to the roll of paper on which the actor’s part was written (the same is the case in other languages e.g. German).
The concept of a role you can take on and off might not have existed in general use long before that. I am uncertain about this thesis but from the evidence I have seen so far, I think this role concept could be the result of the adaptations to the increasing division of labor. Before that people sure were behaving sometimes as a father, carpenter, or soldier—but I contend not because they were consciously switching roles (though some individuals may have reflected sufficiently I presume there was no common knowledge or language for ‘role switching’). The only persons regularly in a different role were actors. It may be that in earlier times it was even necessary to wear a mask to allow actors to take on a role convincingly. To help their audience who otherwise might not have made the mental leap of treating them differently from the acted character.
The Cognitive Range of Roles
A role works from a range of abstraction between professions and automation. In a profession one person masters all the mental and physical aspects of trade and can apply them holistically from small details of handling material imperfections to the organization of the guild. At the border to automation, a worker is reduced to an executor of not yet automated tasks. The expectations on a master craftsman are much more complex than on an assembly-line worker.
With more things getting automated this frees the capacity to automate more complex skills. And so on. This is seen in real-time with software: Today a lot of software is (still) built semi-manually (some build tool is started to compile and bundle files together to some executable or installer). Every developer knows how to do this. It is a simple role to take on: “Can you do the build, please?” As the company and the software grow there is a phase where either this becomes a more fleshed-out role, “build master”, maybe even a dedicated job position—or more likely it will be automated. And as the automation—the build process or “pipeline”—becomes more complicated and you need the role of a build software expert. At this point, some tasks previously requiring skill and experience have been standardized and automated and what remains is the more complex task of managing the automation.
Over time, the range between what is fully automatable and what humans can do in an unpredictable environment, shrinks. With AGI there will be no roles left for humans. At least no roles in the sense used here. I think low skilled people already feel this.
When trying to get an overview of what is considered a role I made this table:
CTF refers to Constructive Developmental Framework as explained e.g. here (nice pics) and also on LW here.
I’m very unsure about the levels I assigned. Also, I got feedback that the degree to which roles can be changed and were changed may be different from how I interpret the historical record.
In any sizable organization, you can find a lot of roles. And a lot of people filling these roles—often multiple ones on the same day. Why do we use so many and fine-grained roles? Why don’t we continue with the coarse-grained and more stable occupations? Because the world got more complicated and everybody got more specialized and roles help with that. Division of labor means breaking down work previously done by one person into smaller parts that are done repeatedly in the same way—and can be assigned to actors: “You are now the widget-maker.” This works best when the tasks are easy to learn so it is easy to find someone to do it. But as humans are not plug-compatible and actual requirements may vary so there is always training required that can be amortized over repeatedly performing the task—a role. So roles make sense structurally—but why do people actually do what is expected of them and don’t just follow their own agenda? This an alignment problem—in this case between the organization and the agent—and we might learn something about the AI alignment problem from it.
What are the common aspects of these labor-sharing roles (in the following called simply roles)?
One common property of a role is that there is common knowledge by the involved persons about the role. Primarily, this shared understanding is about the tasks that can be expected to be performed by the agent acting in the role as well as about the goals to be achieved, and limits to be observed as well other expectations. These expectations are usually already common knowledge long beforehand or they are established when the agent takes on the role.
The second important aspect of a role is that it can be taken on and off or assigned. Sometimes this happens explicitly when an employee is designated or volunteers to be a moderator. Sometimes this happens automatically or habitually to perform certain actions e.g. when a forum moderator writes a comment as a regular poster or when a developer assumes root permissions (sudo lecture).
Other aspects are of interest in the social sciences e.g. interactions between roles or the social function of roles.
I have enabled Reacts for my Shortform. Let’s see how it goes.