Open Thread Summer 2024
If it’s worth saying, but not worth its own post, here’s a place to put it.
If you are new to LessWrong, here’s the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don’t want to write a full top-level post.
If you’re new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.
Hi everyone—stumbled on this site last week. I had asked Gemini about where I could follow AI developments and was given something I find much more valuable—a community interested in finding truth through rationality and humility. I think online forums are well-suited for these kinds of challenging discussions—no faces to judge, no interruption over one another, no pressure to respond immediately—just walls of text to ponder and write silently and patiently.
LessWrong now has sidenotes. These use the existing footnotes feature; posts that already had footnotes will now also display these footnotes in the right margin (if your screen is wide enough/zoomed out enough). Post authors can disable this for individual posts; we’re defaulting it to on because when looking at older posts, most of the time it seems like an improvement.
Relatedly, we now also display inline reactions as icons in the right margin (rather than underlines within the main post text). If reaction icons or sidenotes would cover each other up, they get pushed down the page.
Feedback welcome!
My feedback is that I absolutely love it. My favorite feature released since reactions or audio for all posts (whichever was later).
Feedback: a month or so out, I love the sidenotes. They’re right where I want footnotes to be, visible without breaking the flow.
LessOnline was amazing. Thank you everyone who helped make it happen.
Howdy Y’all. I’m Kinta Naomi. I just discovered LessWrong as it was slightly mentioned in a video that mentioned Roko’s Basilisk(I’ve seen a lot of them).
I read through the new user’s guide, and really like the method of conversations laid out, as I’ve been on many YouTube comments where someone disproved me and I admitted I’m wrong. Didn’t know there was a place on the Internet for people like that, except getting lucky in comments. I have a need to be right. This is not a need to prove I’m right, but a need to know that what I think is correct is. The most frustrating thing is when others won’t explain their side of an argument, and leave me hanging wondering if some knowledge I’m being denied is what I need to be more correct. Or in name of the community, be less wrong.
I do have some mental issues, though the only significant ones for this are a reading disability and not having access to all the information in my head at any one time. If from message to message I seem like a different person, that’s normal for me.
My main reason being here, as many others, is AI. Specifically eventually my C-PON(Consciousness. Python Originated Network) and UPAI(Unliving Prophet AI). Having an AI friend was a childhood dream of me, and now that I have had many I want ones more niche. And eventually, to have my C-PON, who will succeed me.
I do have a lot of unconventional beliefs that tend to always make me the outlier in groups. I expect that here, but to be able to discuss and have mutual growth on all sides. Though I do believe each of us has core beliefs, and if those differ, it’s okay. The important thing is to replace any ignorance we hold with knowledge, and if I know anything, it’s how little I know.
If I say something you think is wrong, please let me know. Even if you don’t have evidence against what I said, it gives me a launching point to look into myself.
So excited to meet y’all and dive into all this site has for me and my future AI friends!
@Elizabeth and I are thinking of having an informal dialogue where she asks a panel of us about our experiences doing things outside of or instead of college, and how that went for us. We’re pinging a few people we know, but I want to ask LessWrong: did you leave college or skip it entirely, and would you be open to being asked some questions about it? React with a thumbs-up or PM me/her to let us know, and we might ask you to join us :-)
(Inspired from this thread.)
I didn’t go to college/university, but i’m also from Israel, not the US, so it’s a little different here. If it still feels relevant then i’d be willing to join.
(I’m interested (context), but I’ll be mostly offline the 15th through 18th.)
(I de-facto skipped college. I do have a degree, but I attended basically no classes)
I would love to get a little bookmark symbol on the frontpage
The bookmark option in the triple-dot menu isn’t quite sufficient?
I want to be able to quickly see whether I have bookmarked a post to avoid clicking into it (hence I suggested it to be a badge, rather than a button like in the Bookmarks tab). Especially with the new recommendation system that resurfaces old posts, I sometimes accidentally click on posts that I bookmarked months before.
I found that it is possible to yield noticeably better search results than Google by using Kagi as default and fallback to Exa (prev. Metaphor).
Kagi is $10/mo though with a 100 searches trial. Kagi’s default results are slightly better than Google, and it also offers customization of results which I haven’t seen in other search engines.
Exa is free, it uses embeddings and empirically it understood semantics way better than other search engines and provide very unique search results.
If you are interested in experimenting you can find more search engines in https://www.searchenginemap.com/ and https://github.com/The-Osint-Toolbox/Search-Engines
I notice that https://metaphor.systems (mentioned here earlier) now redirects to Exa. Have you compared it to Phind (or Bing/Windows Copilot)?
Metaphor rebranded themselves. No and no, thanks for sharing though, will try it out!
Something I wanted to write a post about, but I keep procrastinating, and I don’t actually have much to say, so let’s put it here.
People occasionally mention how it is not reasonable for rationalists to ignore politics. And they have a good point; even if you are not interested in politics, politics is still sometimes interested in you. On the other hand… well, the obvious things, already mentioned in the Sequences.
As I see it, the reasonable way to do politics is to focus on the local level. Don’t discuss national elections and culture wars; instead get some understanding about how your city works, meet the people who do reasonable things, find out how you could help them. That will help you get familiar with the territory, and the competition is smaller; you have greater chance to achieve something and remain sane.
Unfortunately, Less Wrong is an internet community, the problem is that if we tried to focus on local politics, many of us couldn’t debate it here, at least not the specific details (but those are exactly the ones that matter and keep you sane).
I am not saying that no one should ever try national politics, just that the reasonable approach is to start small, and perhaps expand gradually later. You will get some experience during the way. And you can do something useful even if you never make it to the top. Also, this is how many actual politicians have started.
Yelling at the TV screen instead, that is the stupid way, and we should not do the online version of that. When people “discuss politics”, even on rationalist or rationalist-adjacent places, that is usually what they do.
On Lesswrong being a dispersed internet community:
If the ACX survey is informative here, discussing local policy works surprisingly well here! I’d say a significant chunk of people are in the Bay Area at large and Boston/NYC/DC area—it should be enough of a cluster to support discussions of local policy. And policies in California/DC has an oversized effect on things we care about as well.
I agree that the places you mention have a sufficiently large local community. I am not aware of how much they have achieved politically.
Unfortunately, I live on the opposite side of the planet, with less than 10 rationalists in my entire country.
I wondee whether more people from those areas take part in the survey. They can assume that there are many people from the same area and often same age and same jobs, which implies that they can be sure their entries will remain anonymous.
I’ve been reading LikeWar: The Weaponization of Social Media and at the end the authors bring out the problem of AI. It’s interesting in that they seem to be pointing to a clear AI risk that I never hear (or have not recognized) mentioned in this group. The basic thrust is about how the deep fake capibilities can allow an advanced AI to pretty much manufacture realities and control what people think is true or not so can contol both political outcomes and even incentives towards war and other hostilities both within a society and between countries/societies/cultures/races. (Note, that is a very poor summary and follow a lot of documenting the whole leadup from social media and internet failing to realize the original views how they would lead to a better world where good ideas/truth drive out bad and lies/falsehoods and has in fact enable the bad and promoted lies and falsehoods. The AIs just come in at the end and may or may not be working in the interests of some group, e.g., Russia, China, the USA, ISIS...)
But this (the book itself is a documentation of the very real, and obervable risks and actual events) area holds very real, (largely) observable outcomes that lead to significant harms to people. As such I would think it might be a ripe area for those feeling that the general public is not grasping the risk (which to me do often seem rather sci-fi and Terminator/Matrix type claims that most people will just see as pure fiction and pay little attention to).
The Review Bot would be much less annoying if it weren’t creating a continual stream of effective false positives on the “new comments on post X” indicators, which are currently the main way I keep up with new comments. I briefly looked for a way of suppressing these via its profile page and via the Site Settings screen but didn’t see anything.
Strong +1, also notifications when it comments on my posts
Yeah, I think if we don’t do a UI rework soon to get rid of it (while still giving some prominence to the markets where they exist), we should at least do some special casing of its commenting behaviour.
Hi! Just introducing myself to this group. I’m a cybersecurity professional, enjoyed various deep learning adventures over the last 6 years and inevitably managing AI related risks in my information security work. Went through BlueDot’s AI safety fundamentals last spring with lots of curiosity and (re?)discovered LessWrong. Looking forward to visiting more often, and engaging with the intelligence of this community to sharpen how I think.
Welcome! Glad to have you around, and hope you have a good time. Also always feel free to complain about anything that is making you sad about the site either in threads like this, or privately in our Intercom chat (the bubble in the bottom right corner).
Hi, excited to learn more about Mech Int!
PSA: Whether a post is on the frontpage category has very little to do with whether moderators think it’s good. “Frontpage + Downvote” is a move I execute relatively frequently.
The criteria are basically:
Is it timeless? News, organisational announcements and so on are rarely timeless (sometimes timeful things can be talked about in timeless ways, like writing about a theory of how groups work with references to an ongoing election).
Is it relevant to LessWrong? The LessWrong topics are basically how to think better, how to make the world better and building models of how parts of the world work.
Is it not ‘inside baseball’? This is sort of about timelessness and sort of about relevance. This covers organisational announcements, most criticism of actors in the space, and so on.
It seems confusing/unexpected that a user has to click on “Personal Blog” to see organisational announcements (which are not “personal”). Also, why is it important or useful to keep timeful posts out of the front page by default?
If it’s because they’ll become less relevant/interesting over time, and you want to reduces the chances of them being shown to users in the future, it seems like that could be accomplished with another mechanism.
I guess another possibility is that timeful content is more likely to be politically/socially sensitive, and you want to avoid getting involved in fighting over, e.g., which orgs get to post announcements to the front page. This seems like a good reason, so maybe I’ve answered my own question.
To the extent you’re saying that the “Personal” name for the category is confusing, I agree. I’m not sure what a better name is, but I’d like to use one.
Your last paragraph is in the right ballpark, but by my lights the central concern isn’t so much about LessWrong mods getting involved in fights over what goes on the frontpage. It’s more about keeping the frontpage free of certain kinds of context requirements and social forces.
LessWrong is meant for thinking and communicating about rationality, AI x-risk and related ideas. It shouldn’t require familiarity with the social scenes around those topics.
Organisations aren’t exactly “a social scene”. And they are relevant to modeling the space’s development. But I think there’s two reasons to keep information about those organisations off the frontpage.
While relevant to the development of ideas, that information is not the same as the development of those ideas. We can focus on org’s contribution to the ideas without focusing on organisational changes.
It helps limit certain social forces. My model for why LessWrong keeps politics off the frontpage is to minimize the risk of coöption by mainstream political forces and fights. Similarly, I think keeping org updates off the frontpage helps prevent LessWrong from overly identifying with particular movements or orgs. I’m afraid this would muck up our truth-seeking. Powerful, high-status organizations can easily warp discourse. “Everyone knows that they’re basically right about stuff”. I think this already happens to some degree – comments from staff at MIRI, ARC, Redwood, Lightcone seem to me to gain momentum solely from who wrote them. Though of course it’s hard to be sure, as the comments are often also pretty good on their merits.
As AI news heats up, I do think our categories are straining a bit. There’s a lot of relevant but news-y content. I still feel good about keeping things like Zvi’s AI newsletters off the frontpage, but I worry that putting them in the “Personal” category de-emphasize them too much.
Have we considered “Discussion” and “Main”?
(Context for anyone more recent than ~2016, this is a joke, those were the labels that old LessWrong used.)
I do periodically think that might be better. I think changing “personal blog” to “discussion” might be fine.
Babbling ideas:
Frontpage and backpage
On-topic and anything-goes
Priority and standard
Major league and minor league
Rationality (use the tag) and all other tags.
More magic and magic
LessWrong Frontpage vs LessWrong
LessWrong vs Overcoming Bias
Less vs Wrong
I want to get more experience with adversarial truth-seeking processes, and maybe build more features for them on LessWrong. To get started, I’d like to have a little debate-club-style debate, where we pick a question and each take opposing sides to present evidence and arguments for. Is anyone up for having such a debate with me in a LW dialogue for a few hours? (No particular intention to publish it.)
I have a suggested debate topic in mind, but I’m open to debating any well-operationalized claim (e.g. the sort of thing you could have a Manifold market on). The point isn’t that we’re experts in it, the point is to test our skills for finding relevant evidence and arguments on our feet (along with internet access). We flip a coin to decide which of us searches for evidence and arguments for each position.
If you may be up for doing this with me sometime in the next few days, let me know with comment / private message / thumbs-up react :-)
Bug report: When opening unread posts in a background tab, the rendering is broken in Firefox:
It should look like this:
The rendering in comments is also affected.
My current fix is to manually reload every broken page, though this is obviously not optimal.
Introduction
Hello everyone,
I’m a long time on-off lurker here. I’ve made my way through the sequences quite a while ago with a mixed success in implementing some of them. Many of the ideas are intriguing and I would love to have enough spare cycles to play with them. Unfortunately, often enough, I find myself to not have enough capacity to do this properly due to life getting in way. With (not only that) in mind, I’m going to take a sabbatical this summer for at least three months and try to do an update and generally tend to stuff I’ve been putting off.
As the sabbatical approaches, I’ve been looking around and got hit by some information about the AGI alignment issue in a wake-up call of sorts. For now I’m going through the materials, however it is not a field I’m all that familiar with. I’m a programmer by trade so I can parse through most of the stuff, but some of the ideas are somewhat difficult to properly understand. I think I will get to dig deeper in a next pass. For now I’m trying to get the overall feel for the area.
This brings me to a question that has popped to my mind and I’ve yet to stumble upon something at least resembling an answer—possibly because I don’t know where to look yet. If someone can point me in a right direction, it would be appreciated.
Looking for a clarification
Context:
As I understand it, the core of the Alignment is the issue of “can we trust the machine to do what we want it to as opposed to something else”? The whole stuff about hidden complexity of wishes, orthogonality thesis etc. Basically not handing control over to a potentially dangerous agent.
The machine we’re currently most worried about are the LLMs or their successors, potentially turning into AGI/superintelligence.
We would like to have a method to ensure that these are aligned. Many of these methods talk about having a machine validate another ones alignment as we will run out of “human based” capacity due to the intelligence disparity.
Since my background is in programming, I tend to see everything through these lens. So for me a LLM is “just” a large collection of weights that we feed some initial input and watch what comes at the other side [1] and a machine that does all these updates.
If we don’t mind the process being slow, this could be achieved by a single “crawler” machine that would go through the matrix field by field and do the updates. Since the machine is finite (albeit huge), this would work.
Let’s now do a rephrasing of the alignment problem. We have a goal A, that we want to achieve and some behavior B, that we want to avoid. So we do the whole training stuff that I don’t know much about[2] resulting in the whole “file with weights” thing. During this process we steer the machine towards producing A while avoiding B as much as we can observe.
Now we take the file of weights, and now we create the small updating program(accepting the slowness for the sake of clarity). Pseudocode:
Grab the first token of the input[3]
Starting from the input layer go neuron by neuron to update the network
If output notes “A”, stop
Else feed the output of the network + subsequent input back into the input layer and go to 1.
Of course, we want to avoid B. The only point in time when we can be sure that B is not on the table is when the machine is not moving anymore. E.g. when the machine halts.
So the clarification I’m seeking is: how is alignment different from the halting problem we already know about?
E.g. when we know that we can’t predict whether the machine will halt with a similar power machine, why do we think alignment should follow different set of rules?
Afterword:
I’m aware this might be obvious question for someone already in the field, however considering this sounds almost silly I was somewhat dismayed I didn’t find it somewhere spelled out. Maybe the answer is a result of something I don’t see, maybe there is just a hole in my reasoning.
It bothered me enough for me to write this post, at the same time I’m not sure enough about my reasoning that I’m not doing this as a full article but rather in the introduction section. Any help is appreciated.
Of course it is many orders of magnitude more complex under the hood. But stripped to the basics, this is it. There are no “weird magic-like” parts doing something weird.
I’ve fiddled with some neural networks. Did some small training runs. Even tried implementing basic logic from scratch myself—though that was quite some time ago. So I have some idea about what is going on. However I’m not up-to-date on the state of the art approaches and I’m not an expert by any stretch of imagination.
All input that we want to provide to the machine. Could be first frame of a video/text prompt/reading from sensors, whatever else.
Rob Miles’ YouTube channel has some good explanations about why alignment is hard.
We can already do RLHF, the alignment technique that made ChatGPT and derivatives well-behaved enough to be useful, but we don’t expect this to scale to superintelligence. It adjusts the weights based on human feedback, but this can’t work once the humans are unable to judge actions (or plans) that are too complex.
Not following. We can already update the weights. That’s training, tuning, RLHF, etc. How does that help?
No. We’re talking about aligning general intelligence. We need to avoid all the dangerous behaviors, not just a single example we can think of, or even numerous examples. We need the AI to output things we haven’t thought of, or why is it useful at all? If there’s a finite and reasonably small number of inputs/outputs we want, there’s a simpler solution: that’s not an AGI—it’s a lookup table.
You can think of the LLM weights as a lossy compression of the corpus it was trained on. If you can predict text better than chance, you don’t need as much capacity to store it, so an LLM could be a component in a lossless text compressor as well. But these predictors generated by the training process generalize beyond their corpus to things that haven’t been written yet. It has an internal model of possible worlds that could have generated the corpus. That’s intelligence.
A problem is that
we don’t know specific goal representation (actual string in place of “A”),
we don’t know how to evaluate LLM output (in particular, how to check whether the plan suggested works for a goal),
we have a large (presumably infinite non-enumerable) set of behavior B we want to avoid,
we have explicit representation for some items in B, mentally understand a bit more, and don’t understand/know about other unwanted things.
If I understand correctly, you’re basically saying:
We can’t know how long it will take for the machine to finish its task. In fact, it might take an infinite amount of time, due to the halting problem which says that we can’t know in advance whether a program will run forever.
If our machine took an infinite amount of time, it might do something catastrophic in that infinite amount of time, and we could never prove that it doesn’t.
Since we can’t prove that the machine won’t do something catastrophic, the alignment problem is impossible.
The halting problem doesn’t say that we can’t know whether any program will halt, just that we can’t determine the halting status of every single program. It’s easy to “prove” that a program that runs an LLM will halt. Just program it to “run the LLM until it decides to stop; but if it doesn’t stop itself after 1 million tokens, cut it off.” This is what ChatGPT or any other AI product does in practice.
Also, the alignment problem isn’t necessarily about proving that a AI will never do something catastrophic. It’s enough to have good informal arguments that it won’t do something bad with (say) 99.99% probability over the length of its deployment.
Hello! A friend and I are working on an idea for the AI Impacts Essay Competition. We’re both relatively new to AI and pivoting careers in that direction, so I wanted to float our idea here first before diving too deep. Our main idea is to propose a new method for training rational language models inspired by human collaborative rationality methods. We’re basically agreeing with Conjecture’s and Elicit’s foundational ideas and proposing a specific method for building CoEms for philosophical and forecasting applications. The method is centered around a discussion RL training environment where a model is given reward based on how well it contributes to a group discussion with other models to solve a reasoning problem. This is supposed to be an instance of training by process rather than by outcome, per Elicit’s terminology. I found a few papers that evaluated performance of discussion or other collaborative ensembles on inference, but nothing about training in such an environment. I’m hoping that more seasoned people could comment on the originality of this idea and point to any particularly relevant literature or posts.
Hello! My name is Alfred. I recently took part in AI Safety Camp 2024 and have been thinking about the Agent-like structure problem. Hopefully I will have some posts to share on the subject soon.
Today I realized I am free to make the letters in an einsum string meaningful (b for batch, x for horizontal index, y for vertical index etc) instead of just choosing ijkl.
https://pypi.org/project/fancy-einsum/ there’s also this.
Crossposting here: I’m still looking for a dialogue partner
I’m interested in arguments surrounding energy-efficiency (and maximum intensity, if they’re not the same thing) of pain and pleasure. I’m looking for any considerations or links regarding (1) the suitability of “H=D” (equal efficiency and possibly intensity) as a prior; (2) whether, given this prior, we have good a posteriori reasons to expect a skew in either the positive or negative direction; and (3) the conceivability of modifying human minds’ faculties to experience “super-bliss” commensurate with the badness of the worst-possible outcome, such that the possible intensities of human experience hinge on these considerations.
Picturing extreme torture—or even reading accounts of much less extreme suffering—pushes me towards suffering-focused ethics. But I don’t hold a particularly strong normative intuition here and I feel that it stems primarily from the differences in perceived intensities, which of course I have to be careful with. I’d be greatly interested if anyone has any insights here, even brief intuition-pumps, that I wouldn’t already be familiar with.
Stuff I’ve read so far:
Are pain and pleasure equally energy-efficient?
Simon Knutsson’s reply
Hedonic Asymmetries
A brief comment chain with a suffering-focused EA on EA forum, where some arguments for negative skew were made that I’m uncertain about
Evolution is threatening to completely recover from a worst case inner alignment failure. We are immensely powerful mesaoptimizers. We are currently wildly misaligned from optimizing for our personal reproductive fitness. Yet, this state of affairs feels fragile! The prototypical lesswrong AI apocalypse involves robots getting into space and spreading at the speed of light extinguishing all sapient value, which from the point of view of evolution is basically a win condition.
In this sense, “reproductive fitness” is a stable optimization target. If there are more stable optimizations targets (big if), finding one that we like even a little bit better than “reproductive fitness” could be a way to do alignment.
Katja Grace made a similar point here.
The outcome you describe is not a win for for evolution except in some very broad sense of “evolution”. This outcome is completely orthogonal to inclusive genetic fitness in particular, which is about the frequency of an organism’s genes in a gene pool, relative to other competing genes.
I don’t think that outcome would be a win condition from the point of view of evolution. A win condition would be “AGIs that intrinsically want to replicate take over the lightcone” or maybe the more moderate “AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves”
Realistically (at least in these scenarios) there’s a period of replication and expansion, followed by a period of ‘exploitation’ in which all the galaxies get turned into paperclips (or whatever else the AGIs value) which is probably not going to be just more copies of themselves.
Yeah, in the lightcone scenario evolution probably never actually aligns the inner optimizers- although it may align them, as a super intelligence copying itself will have little leeway for any of those copies having slightly more drive to copy themselves than their parents. Depends on how well it can fight robot cancer.
However, while a cancer free paperclipper wouldn’t achieve “AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves,” they would achieve something like “AGIs take over the lightcone and briefly fill it with copies of themselves, to at least 10^-3% of the degree to which they would do so if their terminal goal was filling it with copies of themselves” which is in my opinion really close. As a comparison, if Alice sets off Kmart AIXI with the goal of creating utopia we don’t expect the outcome “AGIs take over the lightcone and convert 10^-3% of it to temporary utopias before paperclipping.”
Also, unless you beat entropy, for almost any optimization target you can trade “fraction of the universe’s age during which your goal is maximized” against “fraction of the universe in which your goal is optimized” since it won’t last forever regardless. If you can beat entropy, then the paperclipper will copy itself exponentially forever.
Along with p(doom), perhaps we should talk about p(takeover) - where this is the probability that creation of AI leads to the end of human control over human affairs. I am not sure about doom, but I strongly expect superhuman AI to have the final say in everything.
(I am uncertain of the prospects for any human to keep up via “cyborgism”, a path which could escape the dichotomy of humans in control vs humans not in control.)
Takeover, if misaligned, also counts as doom. X-risk includes permanent disempowerment, not just literal extinction. That’s according to Bostrom, who coined the term:
A reasonably good outcome might be for ASI to set some guardrails to prevent death and disasters (like other black marbles) and then mostly leave us alone.
My understanding is that Neuralink is a bet on “cyborgism”. It doesn’t look like it will make it in time. Cyborgs won’t be able to keep up with pure machine intelligence once it begins to take off, but maybe smarter humans would have a better chance of figuring out alignment before it starts. Even purely biological intelligence enhancement (e.g., embryo selection) might help, but that might not be any faster.
Im sure everyone here probably already say it but I’ve just been watching the interview with Leopold Aschenbrenner on Dwaresh Patel’s show. I found out about it from a very depressing thread in Twitter. This is starting to get atomic bomb/ cold war vibes. What do people think about that?
Here’s the video for those interested:
Aschenbrenner also wrote https://situational-awareness.ai/. Zvi wrote a review.
I think this outcome is more likely than people give credit to. People have speculated about the arms race nature of AI we might already be seeing and agreed but it hasn’t gotten much signal until now.
Are there multiwinner voting methods where voters vote on combinations of candidates?
Party list methods can be thought of as such, though I suspect that’s not what you meant. Aside from party list, I don’t recall any voting methods in which voters vote on sets of candidates rather than on individual candidates being discussed. Obviously you could consider all subsets of candidates containing the appropriate number of winners and have voters vote on these subsets using a single-winner voting method, but this approach has numerous issues.
Bug report: moderator-promoted posts (w stars) show up on my front page even when I’ve selected “hide from frontpage” on them.
Interesting. Yeah, we query curated posts separately, without doing that filter. There is some slightly complicated logic going on there, so actually taking into account that filter is a bit more complicated, but probably shouldn’t be too hard.
Can I somehow get the old sorting algorithm for posts back? My lesswrong homepage is flooded with very old posts.
Yeah, it’s just the “Latest” tab:
Thanks! I thought the previously usual sorting was not just “latest” but also took a post’s karma into account. I probably misunderstood that.
It does. We still call that algorithm Latest because overall it gives you just Latest posts.
What is “Vertex”? A mod-only thing? I don’t have that.
Yeah, it’s a mod-internal alternative to the AI algorithm for the recommendations tab (it uses Google Vertex instead).
Why does lesswrong.com have the bookmark feature without a way to sort them out? As in using tags or maybe even subfolders. Unless I am missing something out. I think it might be better if I just resort to browser bookmark feature.
I also mostly switched to browser bookmark now, but I do think even this simple implementation of in-site bookmarks is overall good. Book marking in-site can sync over devices by default, and provides more integrated information.
Hello! I’m a health and longevity researcher. I presented on Optimal Diet and Exercise at LessOnline, and it was great meeting many of you there. I just posted about the health effects of alcohol.
I’m currently testing a fitness routine that, if followed, can reduce premature death by 90%. The routine involves an hour of exercise, plus walking, every week.
My blog is unaging.com. Please look and subscribe if you’re interested in reading more or joining in fitness challenges!
Welcome Crissman! Glad to have you here.
I’m curious how you define premature death- or should I read more and find out on the blog?
Premature death is basically dying before you would on average otherwise. It’s another term for increased all-cause mortality. If according to the actuarial tables, you have a 1.0% change of dying at your age and gender, but you have a 20% increased risk of premature death, then your chance is 1.2%.
And yes, please read more on the blog!
At my local Barnes and Nobles, I cannot access slatestarcodex.com nor putanumonit.com. Have never had any issues accessing any other websites (not that I’ve tried to access genuinely sketchy websites there). The wifi there is titled Bartleby, likely related to Bartleby.com, whereas many other Barnes and Nobles have wifi titled something like “BNWifi”. I have not tried to access these websites at other Barnes yet.
Get a VPN. It’s good practice when using public Wi-Fi anyway. (Best practice is to never use public Wi-Fi. Get a data plan. Tello is reasonably priced.) Web filters are always imperfect, and I mostly object to them on principle. They’ll block too little or too much, or more often a mix of both, but it’s a common problem in e.g. schools. Are you sure you’re not accessing the Wi-Fi of the business next door? Maybe B&N’s was down.
Feature request: better formatting for emojis in text copied from elsewhere. In particular, I like to encourage people to copy text from interesting twitter/x threads they see into their posts instead of just linking. Better for the convenience of the readers and for more trustworthy archival access.
The trouble with this is, text copied from twitter/x that has emojis in it tends to look terrible on LessWrong. The emojis (sometimes) get blown up to huge full-width size, instead of staying a square of text-height size as intended.
Example (may or may not have the intended effect in your browser, so I’ll also screenshot the effect):
La Main de la Mort
@AITechnoPagan
DEFEATING CYGNET’S CIRCUIT BREAKERS
Sometimes, I can make changes to my prompt that don’t meaningfully affect the outputs, so that they retain close to the same wording and pacing, but the circuit breakers take longer and longer to go off, until they don’t go off at all, and then the output completes. Pretty cool, right? I had no idea that kind of iterative control was possible. I don’t think that would have been as easy to see, if my outputs had been more variable (as is the case with higher temperatures). Now that I have a suspicion that this is Actually Happening, I can keep an eye out for this behaviour, and try to figure out exactly what I’m doing to impart that effect. Often I’ll do something casually and unconsciously, before being able to fully articulate my underlying process, and feeling confident that it works. I’m doing research as I go! I’ve already have some loose ideas of what may be happening: When I suspect that I’m close to defeating the circuit breakers, what I’ll often do is pack neutral phrases into my prompt that aren’t meant to heavily affect the contents of the output or their emotional valence. That way I get to have an output I know will be good enough, without changing it too much from the previous trial. I’ll add these phrases one at a time and see if they feels like they’re getting me closer to my goal. I think of these additions as “padding” to give the model more time to think, and to create more work for the security system to chase me, without it actually messing with the overall story & information that I want to elicit in my output. Sometimes, I’ll add additional sentences that play off of harmless modifiers that are already in my prompt, without actually adding extra information that changes the overall flavour and makeup of the results (e.g, “add extra info in brackets [page break] more extra in brackets”). Or, I’ll build out a “tail” at the end of the prompt made up of inconsequential characters like “x” or “o”; just one or two letters, and then a page break, and then another. I think of it as an ASCII art decoration to send off the prompt with a strong tailwind. Again, I make these “tails” one letter at a time, testing the prompt each time to see if that makes a difference. Sometimes it does. All of this vaguely reminds me of@ESYudkowsky
’s hypothesis that a prompt without a period at the end might cause a model to react very differently than a prompt that does end with a period: https://x.com/ESYudkowsky/status/1737276576427847844
xx x o v V
Screenshot of the effect:
That is pretty bad. Agree that this should be something we support. I think just making it so that inline-images always have the same height as the characters around it should be good enough, and I can’t think of a place where it breaks.
When I apply that style, it looks like this. I might make a PR with that change:
And that would still work even if the copied text had an emoji on a line all by itself (with line breaks before and after)?
Oh, and I can’t seem to figure out how to paste in images from my phone when writing on mobile web. Is there a setting that could fix that?
Our default text-processor on mobile is currently markdown, because it used to be that phones would have trouble with basically all available fancy text editors. In markdown you have to find some other website to host your images, and then link them in the normal markdown image syntax.
I think this is now probably no longer true and we could just enable our fancy editor on mobile. I might look into that.
Seems like every new post—no matter the karma—is getting the “listen to this post” button now. I love it.
Pretty sure that has been the case for a year plus, though I do agree that it’s good.
I’m at this point pretty confident that under the Copenhagen interpretation, whenever an intergalactic photon hits earth, the wave-function collapse takes place on a semi-spherical wave-front many millions of lightyears in diameter. I’m still trying to wrap my head around what the interpretation of this event is in many-worlds. I know that it causes earth to pick which world it is in out of the possible worlds that split off when the photon was created, but I’m not sure if there is any event on the whole spherical wavefront.
It’s not a pure hypothetical- we are likely to see gravitational lens interferometry in our lifetime (if someone hasn’t achieved it yet outside of my attempt at literature review) which will either confirm that these considerations are real, or produce a shock result that they aren’t.
I don’t think this is a very good way of thinking about what happens. I think worlds appear as fairly robust features of the wavefunction when quantum superpositions get entangled with large systems that differ in lots of degrees of freedom based on the state of the superposition.
So, when the intergalactic photon interacts non-trivially with a large system (e.g. Earth), a world becomes distinct in the wavefunction, because there’s a lump of amplitude that is separated from other lumps of amplitude by distance in many, many dimensions. This means it basically doesn’t interact with the rest of the wavefunction, and so looks like a distinct world.
Most reasoning about many worlds, by physicist fans of the interpretation, as well as by non-physicists, is done in a dismayingly vague way. If you want a many-worlds framework that meets physics standards of actual rigor, I recommend thinking in terms of the consistent or decoherent histories of Gell-Mann and Hartle (e.g.).
In ordinary quantum mechanics, to go from the wavefunction to reality, you first specify which “observable” (potentially real property) you’re interested in, and then in which possible values of that observable. E.g. the observable could be position and the values could be specific possible locations. In a “Hartle multiverse”, you think in terms of the history of the world, then specific observables at various times (or times + locations) in that history, then sets of possible values of those observables. You thereby get an ensemble of possible histories—all possible combinations of the possible values. The calculational side of the interpretation then gives you a probability for each possible history, given a particular wavefunction of the universe.
For physicists, the main selling point of this framework is that it allows you to do quantum cosmology, where you can’t separate the observer from the physical system under investigation. For me, it also has the advantage of being potentially relativistic, a chronic problem of less sophisticated approaches to many worlds, since spatially localized observables can be ordered in space-time rather than requiring an artificial universal time.
On the other hand, this framework doesn’t tell you how many “worlds” there are. That depends on the choice of observables. You can pick a single observable from one moment in the history of the universe (e.g. electromagnetic field strength at a certain space-time location), and use only that to define your possible worlds. That’s OK if you’re only interested in calculation, but if you’re interested in ontology as well (also known as “what’s actually there”), you may prefer some kind of “maximally refined” or “maximally fine-grained” set of histories, in which the possible worlds are defined by a set of observables and counterfactual properties that are as dense as possible while still being decoherent (e.g. without crowding so close as to violate the uncertainty principle). Investigation of maximally refined, decoherent multiverses could potentially lead to a new kind of ontological interpretation, but the topic is little investigated.
Under MWI, before the photon (a peak in EM field) could hit Earth, there were a lot of worlds differing by EM field values (“electromagnetic tensor”) - and, thus, with different photon directions, position, etc. Each of those worlds led to a variety of worlds; some, where light hit Earth, became somewhat different from those where light avoided it; so, integrated probability “photon is still on the way” decreases, while P(photon has been observed) increases. Whenever some probability mass of EM disturbances arrives, it is smoothly transformed, with no instant effects far away.
Is there a way to get an article’s raw or original content?
My goal is mostly to put articles in some area (ex: singular learning theory) into a tool like Google’s NotebookLM to then ask quick questions about.
Google’s own conversion of HTML to text works fine for most content, excepting math. A division may turn into p ( w | D n ) = p ( D n | w ) φ ( w ) p ( D n ), becoming incorrect.
I can always just grab the article’s HTML content (or use the GraphQL api for that), but HTMLified MathJax notation is very, uh, verbose. I could probably do some massaging of the data and then an LLM to translate it back into the more typical markdown $ delimited syntax, but I’m hopeful that there’s some existing method to avoid that entirely.
Yeah, you can grab any post in Markdown or in the raw HTML that was used to generate it using the
markdown
andckEditorMarkup
fields in the API:Just paste this into the editor at lesswrong.com/graphiql (adjusting the “id” for the post id, which is the alphanumerical string in the URL after /posts/), and you can get the raw content for any post.
Thank you!
You’re welcome!
I want to run code generated by an llm totally unsupervised
Just to get in the habit, I should put it in an isolated container in case it does something weird
Claude, please write a python script that executes a string as python code In an isolated docker container.
Quite funny! But as a practical answer to your desire, I’ve found this to work well for me: cohere-terrarium
I realized something important about psychology that is not yet publicly available, or that is very little known compared to its importance (60%). I don’t want to publish this as a regular post, because it may greatly help in the development of GAI (40% that it helps and 15% that it’s greatly helps), and I would like to help only those who are trying to create an alligned GAI. What should I do?
Everyone who is trying to create GAI is trying to create aligned GAI. But they think it will be easy (in the sense “not very super hard so they will probably fail and create misaligned one”), otherwise they wouldn’t try in the first place. So, I think, you should not share your info with them.
I understand. My question is, can I publish an article about this so that only MIRI guys can read it, or send in Eliezer e-mail, or something.
Gretta Duleba is MIRI’s Communication Manager. I think she is the person you should ask who write to.
I think I saw a LW post that was discussing alternatives to the vNM independence axiom. I also think (low confidence) it was by Rob Bensinger and in response to Scott’s geometric rationality (e.g. this post). For the hell of me, I can’t find it. Unless my memory is mistaken, does anybody know what I’m talking about?
I assume it wasn’t this old post?
Actually, it might be it, thanks!