See something I’ve written which you disagree with? I’m experimenting with offering cash prizes of up to US$1000 to anyone who changes my mind about something I consider important. Message me our disagreement and I’ll tell you how much I’ll pay if you change my mind + details :-) (EDIT: I’m not logging into Less Wrong very often now, it might take me a while to see your message—I’m still interested though)
John_Maxwell
Makes sense, thanks.
For whatever it’s worth, I believe I was the first to propose weighted voting on LW, and I’ve come to agree with Czynski that this is a big downside. Not necessarily enough to outweigh the upsides, and probably insufficient to account for all the things Czynski dislikes about LW, but I’m embarrassed that I didn’t foresee it as a potential problem. If I was starting a new forum today, I think I’d experiment with no voting at all—maybe try achieving quality control by having an application process for new users? Does anyone have thoughts about that?
Another possible AI parallel: Some people undergo a positive feedback loop where more despair leads to less creativity, less creativity leads to less problem-solving ability (e.g. P100 thing), less problem-solving ability leads to a belief that the problem is impossible, and a belief that the problem is impossible leads to more despair.
- Jan 6, 2025, 6:43 PM; 8 points) 's comment on Dmitry Vaintrob’s Shortform by (
China’s government is more involved to large-scale businesses.
According to the World Economic Forum website:
China is home to 109 corporations listed on the Fortune Global 500 - but only 15% of those are privately owned.
Like, maybe depending on the viewer history, the best video to polarize the person is different, and the algorithm could learn that. If you follow that line of reasoning, the system starts to make better and better models of human behavior and how to influence them, without having to “jump out of the system” as you say.
Makes sense.
...there’s a lot of content on YouTube about YouTube, so it could become “self-aware” in the sense of understanding the system in which it is embedded.
I think it might be useful to distinguish between being aware of oneself in a literal sense, and the term “self-aware” as it is used colloquially / the connotations the term sneaks in.
Some animals, if put in front of a mirror, will understand that there is some kind of moving animalish thing in front of them. The ones that pass the mirror test are the ones that realize that moving animalish thing is them.
There is a lot of content on YouTube about YouTube, so the system will likely become aware of itself in a literal sense. That’s not the same as our colloquial notion of “self-awareness”.
IMO, it’d be useful to understand the circumstances under which the first one leads to the second one.
My guess is that it works something like this. In order to survive and reproduce, evolution has endowed most animals with an inborn sense of self, to achieve self-preservation. (This sense of self isn’t necessary for cognition—if you trip on psychedelics and experience ego death, your brain can still think. Occasionally people will hurt themselves in this state since their self-preservation instincts aren’t functioning as normal.)
Colloquial “self-awareness” occurs when an animal looking in the mirror realizes that the thing in the mirror and its inborn sense of self are actually the same thing. Similar to Benjamin Franklin realizing that lightning and electricity are actually the same thing.
If this story is correct, we need not worry much about the average ML system developing “self-awareness” in the colloquial sense, since we aren’t planning to endow it with an inborn sense of self.
That doesn’t necessarily mean I think Predict-O-Matic is totally safe. See this post I wrote for instance.
I suspect the best way to think about the polarizing political content thing which is going on right now is something like: The algorithm knows that if it recommends some polarizing political stuff, there’s some chance you will head down a rabbit hole and watch a bunch more vids. So in terms of maximizing your expected watch time, recommending polarizing political stuff is a good bet. “Jumping out of the system” and noticing that recommending polarizing videos also polarizes society as a whole and gets them to spend more time on Youtube on a macro level seems to require a different sort of reasoning.
For the stock thing, I think it depends on how the system is scored. When training a supervised machine learning model, we score potential models based on how well they predict past data—data the model itself has no way to affect (except if something really weird is going on?) There doesn’t seem to be much incentive to select a model that makes self-fulfilling prophecies. A model which ignores the impact of its “prophecies” will score better, insofar as the prophecy would’ve affected the outcome.
I’m not necessarily saying there isn’t a concern here, I just think step 1 is to characterize the problem precisely.
Not sure if this answers, but the book Superforecasting explains, among other things, that probabilistic thinkers tend to make better forecasts.
Yes, I didn’t say “they are not considering that hypothesis”, I am saying “they don’t want to consider that hypothesis”. Those do indeed imply very different actions. I think one gives very naturally rise to producing counterarguments, the other one does not.
They don’t want to consider the hypothesis, and that’s why they’ll spend a bunch of time carefully considering it and trying to figure out why it is flawed?
In any case… Assuming the Twitter discussion is accurate, some people working on AGI have already thought about the “alignment is hard” position (since those expositions are how they came to work on AGI). But they don’t think the “alignment is hard” position is correct—it would be kinda dumb to work on AGI carelessly if you thought that position is correct. So it seems to be a matter of considering the position and deciding it is incorrect.
I am not really sure what you mean by the second paragraph. AI is being actively regulated, and there are very active lobbying efforts on behalf of the big technology companies, producing large volumes of arguments for why AI is nothing you have to worry about.
That’s interesting, but it doesn’t seem that any of the arguments they’ve made have reached LW or the EA Forum—let me know if I’m wrong. Anyway I think my original point basically stands—from the perspective of EA cause prioritization, the incentives to dismantle/refute flawed arguments for prioritizing AI safety are pretty diffuse. (True for most EA causes—I’ve long maintained that people should be paid to argue for unincentivized positions.)
What? What about all the people who prefer to do fun research that builds capabilities and has direct ways to make them rich, without having to consider the hypothesis that maybe they are causing harm?
If they’re not considering that hypothesis, that means they’re not trying to think of arguments against it. Do we disagree?
I agree if the government was seriously considering regulation of AI, the AI industry would probably lobby against this. But that’s not the same question. From a PR perspective, just ignoring critics often seems to be a good strategy.
There was an interesting discussion on Twitter the other day about how many AI researchers were inspired to work on AGI by AI safety arguments. Apparently they bought the “AGI is important and possible” part of the argument but not the “alignment is crazy difficult” part.
I do think the AI safety community has some unfortunate echo chamber qualities which end up filtering those people out of the discussion. This seems bad because (1) the arguments for caution might be stronger if they were developed by talking to the smartest skeptics and (2) it may be that alignment isn’t crazy difficult and the people filtered out have good ideas for tackling it.
If I had extra money, I might sponsor a prize for a “why we don’t need to worry about AI safety” essay contest to try & create an incentive to bridge the tribal gap. Could accomplish one or more of the following:
-
Create more cross talk between people working in AGI and people thinking about how to make it safe
-
Show that the best arguments for not needing to worry, as discovered by this essay contest, aren’t very good
-
Get more mainstream AI people thinking about safety (and potentially realizing over the course of writing their essay that it needs to be prioritized)
-
Get fresh sets of eyes on AI safety problems in a way that could generate new insights
Another point here is that from a cause prioritization perspective, there’s a group of people incentivized to argue that AI safety is important (anyone who gets paid to work on AI safety), but there’s not really any group of people with much of an incentive to argue the reverse (that I can think of at least, let me know if you disagree). So we should expect the set of arguments which have been published to be imbalanced. A contest could help address that.
-
In Thinking Fast and Slow, Daniel Kahneman describes an adversarial collaboration between himself and expertise researcher Gary Klein. They were originally on opposite sides of the “how much can we trust the intuitions of confident experts” question, but eventually came to agree that expert intuitions can essentially be trusted if & only if the domain has good feedback loops. So I guess that’s one possible heuristic for telling apart a group of sound craftsmen from a mutual admiration society?
Humans aren’t fit to run the world, and there’s no reason to think humans can ever be fit to run the world.
I see this argument pop up every so often. I don’t find it persuasive because it presents a false choice in my view.
Our choice is not between having humans run the world and having a benevolent god run the world. Our choice is between having humans run the world, and having humans delegate the running of the world to something else (which is kind of just an indirect way of running the world).
If you think the alignment problem is hard, you probably believe that humans can’t be trusted to delegate to an AI, which means we are left with either having humans run the world (something humans can’t be trusted to do) or having humans build an AI to run the world (also something humans can’t be trusted to do).
The best path, in my view, is to pick and choose in order to make the overall task as easy as possible. If we’re having a hard time thinking of how to align an AI for a particular situation, add more human control. If we think humans are incompetent or untrustworthy in some particular circumstance, delegate to the AI in that circumstance.
It’s not obvious to me that becoming wiser is difficult—your comment is light on supporting evidence, violence seems less frequent nowadays, and it seems possible to me that becoming wiser is merely unincentivized, not difficult. (BTW, this is related to the question of how effective rationality training is.)
However, again, I see a false choice. We don’t have flawless computerized wisdom at the touch of a button. The alignment problem remains unsolved. What we do have are various exotic proposals for computerized wisdom (coherent extrapolated volition, indirect normativity) which are very difficult to test. Again, insofar as you believe the problem of aligning AIs with human values is hard, you should be pessimistic about these proposals working, and (relatively) eager to shift responsibility to systems we are more familiar with (biological humans).
Let’s take coherent extrapolated volition. We could try & specify some kind of exotic virtual environment where the AI can simulate idealized humans and observe their values… or we could become idealized humans. Given the knowledge of how to create a superintelligent AI, the second approach seems more robust to me. Both approaches require us to nail down what we mean by an “idealized human”, but the second approach does not include the added complication+difficulty of specifying a virtual environment, and has a flesh and blood “human in the loop” observing the process at every step, able to course correct if things seem to be going wrong.
The best overall approach might be a committee of ordinary humans, morally enhanced humans, and morally enhanced ems of some sort, where the AI only acts when all three parties agree on something (perhaps also preventing the parties from manipulating each other somehow). But anyway...
You talk about the influence of better material conditions and institutions. Fine, have the AI improve our material conditions and design better institutions. Again I see a false choice between outcomes achieved by institutions and outcomes achieved by a hypothetical aligned AI which doesn’t exist. Insofar as you think alignment is hard, you should be eager to make an AI less load-bearing and institutions more load-bearing.
Maybe we can have an “institutional singularity” where we have our AI generate a bunch of proposals for institutions, then we have our most trusted institution choose from amongst those proposals, we build the institution as proposed, then have that institution choose from amongst a new batch of institution proposals until we reach a fixed point. A little exotic, but I think I’ve got one foot on terra firma.
We removed the historical 10x multiplier for posts that were promoted to main on LW 1.0
Are comments currently accumulating karma in the same way that toplevel posts do?
When I read this essay in 2019, I remember getting the impression that approval-extracting vs production-oriented was supposed to be about the behavior of the founders, not the industry the company competes in.
I was using it to refer to “any inner optimizer”. I think that’s the standard usage but I’m not completely sure.
With regard to the editing text discussion, I was thinking of a really simple approach where we resample words in the text at random. Perhaps that wouldn’t work great, but I do think editing has potential because it allows for more sophisticated thinking.
Let’s say we want our language model to design us an aircraft. Perhaps its starts by describing the engine, and then it describes the wings. Standard autoregressive text generation (assuming no lookahead) will allow the engine design to influence the wing design (assuming the engine design is inside the context window when it’s writing about the wings), but it won’t allow the wing design to influence the engine design. However, if the model is allowed to edit its text, it can rethink the engine in light of the wings and rethink the wings in light of the engine until it’s designed a really good aircraft.
In particular, it would be good to figure out some way of contriving a mesa-optimization setup, such that we could measure if these fixes would prevent it or not.
Agreed. Perhaps if we generated lots of travelling salesman problem instances where the greedy approach doesn’t get you something that looks like the optimal route, then try & train a GPT architecture to predict the cities in the optimal route in order?
This is an interesting quote:
...in our experience we find that lean stochastic local search techniques such as simulated annealing are often the most competitive for hard problems with little structure to exploit.
I suspect GPT will be biased towards avoiding mesa-optimization and making use of heuristics, so the best contrived mesa-optimization setup may be an optimization problem with little structure where heuristics aren’t very helpful. Maybe we could focus on problems where non-heuristic methods such as branch and bound / backtracking are considered state of the art, and train the architecture to mesa-optimize by starting with easy instances and gradually moving to harder and harder ones.
Thanks for sharing!
I also felt frustrated by lack of feedback my posts got, my response was to write this: https://www.lesswrong.com/posts/2E3fpnikKu6237AF6/the-case-for-a-bigger-audience Maybe submitting LW posts to targeted subreddits could be high impact?
LessWrong used to have a lot of comments back in the day. I wonder if part of the issue is simply that the number of posts went up, which means a bigger surfaces for readers to be spread across. Why did the writer/reader ratio go up? Perhaps because writing posts falls into the “endorsed” category, whereas reading/writing comments feels like “time-wasting”. And as CFAR et al helped rationalists be more productive, they let activities labeled as “time-wasting” fall by the wayside. (Note that there’s something rather incoherent about this: If the subject matter of the post was important enough to be worth a post, surely it is also worth reading/commenting?)
Anyway, here are the reasons why commenting falls into the “endorsed” column for me:
It seems neglected. See above argument.
I suspect people actually read comments a fair amount. I know I do. Sometimes I will skip to the comments before reading the post itself.
Writing a comment doesn’t trigger the same “officialness” anxiety that writing a post does. I don’t feel obligated to do background research, think about how my ideas should be structured, or try to anticipate potential lines of counterargument.
Taking this further, commenting doesn’t feel like work. So it takes fewer spoons. I’m writing this comment during a pre-designated goof off period, in fact. The ideal activity is one which is high-impact yet feels like play. Commenting and brainstorming are two of the few things that fall in that category for me.
I know there was an effort to move the community from Facebook to LW recently. Maybe if we pitched LW as “just as fun as Facebook, but discussing more valuable things and adding to a searchable/taggable knowledge archive” that could lure people over? IMO the concept of “work that feels like play” is underrated in the rationalist and EA communities.
Unfortunately, even though I find it fun to write comments, I tend to get demoralized a while later when my comments don’t get comment replies themselves :P So that ends up being an “endorsed” reason to avoid commenting.
Related to the discussion of weighted voting allegedly facilitating groupthink earlier https://www.lesswrong.com/posts/kxhmiBJs6xBxjEjP7/weighted-voting-delenda-est
An interesting litmus test for groupthink might be: What has LW changed its collective mind about? By that I mean: the topic was discussed on LW, there was a particular position on the issue that was held by the majority of users, new evidence/arguments came in, and now there’s a different position which is held by the majority of users. I’m a bit concerned that nothing comes to mind which meets these criteria? I’m not sure it has much to do with weighted voting because I can’t think of anything from LW 1.0 either.