Software engineering, parenting, cognition, meditation, other
Linkedin, Facebook, Admonymous (anonymous feedback)
Gunnar_Zarncke
[^1]: Without loss of generality, the same applies if market B had a close-to-true probability estimate than A
You can insert real footnotes in the LW editor by marking text and using the footnote button in the popup menu.
There is a “more” missing at the end of the sentence.
I just saw a method to make more parts of the model human-legible, addressing the main concern.
Reasoning in non-legible latent spaces is a risk that can be addressed by making the latent spaces independently human interpretable. One such method is LatentQA: Teaching LLMs to Decode Activations Into Natural Language. Such methods have the advantage of making not only the output layer human-readable but potentially more parts of the model.
List, complete lists. You can ask LLMs to give you lists of all kinds of things, and they will tend to be more complete than you could come up with yourself. Whether it is things you should buy, consider, evaluate, or try. For example:
What are proposed explanations/causes for X?
What are suitable replacements for X in Y?
What are well-known people working in X?
What are criteria to consider when buying X?
What are common names given to X by Y?
And you can often ask for more.
Sure, but you could design the test in a way that makes this more likely, such as in a dialog with AI:
person: “Ask me a question?”
AI: “What is a quorum?”
person: “Wait, I think I remember this. Let me think.”
AI: “What is thinking?”
person: “Thinking is what goes on in people’s minds, e.g., before they speak, or even during. For example, I just noticed that I didn’t know this and wanted to explore options before answering.”
AI: …
If the AI says: “Interesting, that is also what happens for me.” then presumably it has consciousness.
I think the single most important point is: Keep a paradigm with human-legible CoT. Most other points are downstream of that. If it is legible, it is possible and more likely to notice that it is not faithful and to build monitoring on top. It might be the single simple You Get About Five Words thing that might make it into regulation.
What would be your The Best Textbooks on Every Subject on heritability?
what do you think about Robin Hanson’s culture and value drift?
https://www.overcomingbias.com/p/how-fix-cultural-drift
And that is why the rewiring of the pubescent brain involves changes that enable that. As the brain can’t hardwire changes to values, which are high-level learned, it has to go some other way.
Change the ground truth feedback (brain stem rewiring).
Weaken all previous connections (synaptic pruning?).
Reduce learning rate (myelination).
Something else? Changes to other hyperparameters (hormonal changes?).
One big element of the dangers of unaligned AI is that it acts as a coherent entity, an agent that has agency and can do things. We could try to remove this property from the models, for example, by gradient rooting and ablating. But agents are useful. We want to give the LM tasks that it executes on our behalf. Can we give tasks to them without them being a coherent unit that has potential goals of its own? All right Think it should be possible to shape the model in a way that it has a reduced form of agency. what forms could this agency take?
Oracle—an oracle that knows and predicts but doesn’t have identity or goals
Delegate—acting without own identity but modeling the identity of the user
Tool/Service/Automation—running a standardized process across all users without “being” that process
I think many would argue that (natural) philosophy is that super science, but while it might have been true at some point, I think it is relatively far from it today.
We can get rid of all of this deception by getting rid of agency. That should be possible with methods based on Gradient Routing or Self-Other Overlap Finetuning or variants thereof. For example, you could use gradient routing to route agency and identity to one part that gets later ablated.
The problem is that we want the model to have agency. We want actors that can solve complex tasks, do things on our behalf, and pursue our goals for us.
I see two potential ways out:
more
link is missing
Much of marketing and sales is intended to make us think fast and go by intuition. The former by using deadlines and the latter by appealing to emotions and how we’d feel about the decision. Or by avoiding the decision altogether, e.g., by making us think past the sale or creating situations where each outcome is a win.
Small variations can make a notable differences with pancakes. I still haven’t managed to reproduce even close to my grandma’s pancakes’ sweet spot, though I think the island of stability is somewhere with more eggs and more heat. I usually don’t use baking powder, but it is possible to create some fluffiness with more oil, more heat, and frequent mixing.
I do not follow German/EU politics for that reason. I did follow the US elections out of interest and believed that I would be sufficiently detached and neutral—and it still took some share of attention.
In terms of topics (generally, not EU or US), I think it makes sense to have an idea of crime and healthcare etc. - but not on the day-by-day basis, because there is too much short-term information warfare going on (see below). Following decent bloggers or reading papers about longer-term trends makes sense though.
Politicians’ political beliefs
Politicians’ personal lives
I think that is almost hopeless without deep inside knowledge. There is too much Simulacrum Levels 3 and 4 communication going on. When a politician says: “I will make sure that America produces more oil.” What does that mean? It surely doesn’t mean that the politician will make sure that America produces more oil. It means (or could mean):
The general population hears: “Oil prices will go down.”
Oil-producers hear: “Regulations may be relaxed about producing oil in America.”
Other countries hear: “America wants to send us a signal that they may compete on oil.”
…
Who are the parties the message is directed to, and how will they hear it? It is hard to know without a lot of knowledge about the needed messaging. It is a bit like the stock/crypto market: When you buy (or sell), you have to know why the person who is selling (or buying) your share doing so? If you don’t know, then, likely, you are the one making a loss. If you don’t know who the message is directed to, you cannot interpret it properly.
And you can’t go by the literal words. Or rather, the literal words are likely directed to somebody too (probably intellectuals, but what do I know) and likely intended to distract them.
he desire to fit in, to be respected, liked and admired by other people, is one of the core desires that most (virtually all?) people have. It’s approximately on the same level as e.g. the desire to avoid pain.
I think the comparison to pain is correct in the sense that some part of the brain (brainstem) is responding to bodily signals in the same mechanistic way as it is to pain signals. The desire to fit in is grounded in something. Steven Byrnes suggests a mechanism in Neuroscience of human social instincts: a sketch.
We call it “peer pressure” when it is constraining the individual (or at least some of them) without providing perceived mutual value. It is the same mechanism that leads to people collaborating for the common good. The interesting question is which forces or which environments lead to a negative sum game.
some people would just develop a different persona for each group
That is possible but maybe only more likely if the groups are very clearly separate, such as when you are in a faraway country for a long time. But if you are e.g. in a multi-cultural city where there are many maybe even overlapping groups or where you can’t easily tell which group it is, it is more difficult to “overfit” and easier to learn a more general strategy. I think universal morality is something of the more general case of this.
Except where it is not or rather where it has saturated at a low level. Almost all growth is logistic growth and it is difficult to extrapolate logistic functions.