LessWrong developer, rationalist since the Overcoming Bias days. Jargon connoisseur.
jimrandomh
There’s a version of this that’s directional advice: if you get a “bad vibe” from someone, how strongly should this influence your actions towards them? Like all directional advice, whether it’s correct or incorrect depends on your starting point. Too little influence, and you’ll find yourself surrounded by bad characters; too much, and you’ll find yourself in a conformism bubble. The details of what does and doesn’t trigger your “bad vibe” feeling matters a lot; the better calibrated it is, the more you should trust it.
There’s a slightly more nuanced version, which is if you get a “bad vibe” from someone, do you promote it to attention and think explicitly about what it might mean, and how do you relate to those thoughts?
I think for many people, that kind of explicit thinking is somewhat hazardous, because it allows red flags to be explained away in ways that they shouldn’t be. To take a comically exaggerated example that nevertheless literally happened: There was someone who described themself as a Sith Lord and wears robes. If you engage with that using only subconscious “vibe” reasoning, you would have avoided them. If you engaged with that using verbal reasoning, they might convince you that “Sith” is just a flavorful way of saying anti-authoritarianism, and also that it’s a “religion” and you’re not supposed to “discriminate”. Or, phrased slightly differently: verbal thinking increases the surface area through which you can get hacked.
Recently, a lot of very-low-quality cryptocurrency tokens have been seeing enormous “market caps”. I think a lot of people are getting confused by that, and are resolving the confusion incorrectly. If you see a claim that a coin named $JUNK has a market cap of $10B, there are three possibilities. Either: (1) The claim is entirely false, (2) there are far more fools with more money than expected, or (3) the $10B number is real, but doesn’t mean what you’re meant to think it means.
The first possibility, that the number is simply made up, is pretty easy to cross off; you can check with a third party. Most people settle on the second possibility: that there are surprisingly many fools throwing away their money. The correct answer is option 3: “market cap” is a tricky concept. And, it turns out that fixing the misconception here also resolves several confusions elsewhere.
(This is sort-of vagueblogging a current event, but the same current event has been recurring every week with different names on it for over a year now. So I’m explaining the pattern, and deliberately avoiding mention of any specific memecoin.)
Suppose I autograph a hat, then offer to sell you one-trillionth of that hat for $1. You accept. This hat now has a “market cap” of $1T. Of course, it would be silly (or deceptive) if people then started calling me a trillionaire.
Meme-coins work similarly, but with extra steps. The trick is that while they superficially look like a market of people trading with each other, in reality almost all trades have the coin’s creator on one side of the transaction, they control the price, and they optimize the price for generating hype.
Suppose I autograph a hat, call it HatCoin, and start advertising it. Initially there are 1000 HatCoins, and I own all of them. I get 4 people, arriving one at a time, each of whom decides to spend $10 on HatCoin. They might be thinking of it as an investment, or they might be thinking of it as a form of gambling, or they might be using it as a tipping mechanism, because I have entertained them with a livestream about my hat. The two key elements at this stage are (1) I’m the only seller, and (2) the buyers aren’t paying much attention to what fraction of the HatCoin supply they’re getting. As each buyer arrives and spends their $10, I decide how many HatCoins to give them, and that decision sets the “price” and “market cap” of HatCoin. If I give the first buyer 10 coins, the second buyer 5 coins, the third buyer 2 coins, and the fourth buyer 1 coin then the “price per coin” went from $1 to $2 to $5 to $10, and since there are 1000 coins in existence, the “market cap” went from $1k to $2k to $5k to $10k. But only $40 has actually changed hands.
At this stage, where no one else has started selling yet, so I fully control the price graph. I choose a shape that is optimized for a combination of generating hype (so, big numbers), and convincing people that if they buy they’ll have time left before the bubble bursts (so, not too big).
Now suppose the third buyer, who has 2 coins that are supposedly worth $20, decides to sell them. One of three things happens. Option 1 is that I buy them back for $20 (half of my profit so far), and retain control of the price. Option 2 is that I don’t buy them, in which case the price goes to zero and I exit with $40.
If a news article is written about this, the article will say that I made off with $10k (the “market cap” of the coin at its peak). However, I only have $40. The honest version of the story, the one that says I made off with $40, isn’t newsworthy, so it doesn’t get published or shared.
Epistemic belief updating: Not noticeably different.
Task stickiness: Massively increased, but I believe this is improvement (at baseline my task stickiness is too low so the change is in the right direction).
I won’t think that’s true. Or rather, it’s only true in the specific case of studies that involve calorie restriction. In practice that’s a large (excessive) fraction of studies, but testing variations of the contamination hypothesis does not require it.
(We have a draft policy that we haven’t published yet, which would have rejected the OP’s paste of Claude. Though note that the OP was 9 months ago.)
All three of these are hard, and all three fail catastrophically.
If you could make a human-imitator, the approach people usually talk about is extending this to an emulation of a human under time dilation. Then you take your best alignment researcher(s), simulate them in a box thinking about AI alignment for a long time, and launch a superintelligence with whatever parameters they recommend. (Aka: Paul Boxing)
The whole point of a “test” is that it’s something you do before it matters.
As an analogy: suppose you have a “trustworthy bank teller test”, which you use when hiring for a role at a bank. Suppose someone passes the test, then after they’re hired, they steal everything they can access and flee. If your reaction is that they failed the test, then you have gotten confused about what is and isn’t a test, and what tests are for.
Now imagine you’re hiring for a bank-teller role, and the job ad has been posted in two places: a local community college, and a private forum for genius con artists who are masterful actors. In this case, your test is almost irrelevant: the con artists applicants will disguise themselves as community-college applicants until it’s too late. You would be better finding some way to avoid attracting the con artists in the first place.
Connecting the analogy back to AI: if you’re using overpowered training techniques that could have produced superintelligence, then trying to hobble it back down to an imitator that’s indistinguishable from a particular human, then applying a Turing test is silly, because it doesn’t distinguish between something you’ve successfully hobbled, and something which is hiding its strength.
That doesn’t mean that imitating humans can’t be a path to alignment, or that building wrappers on top of human-level systems doesn’t have advantages over building straight-shot superintelligent systems. But making something useful out of either of these strategies is not straightforward, and playing word games on the “Turing test” concept does not meaningfully add to either of them.
that does not mean it will continue to act indistuishable from a human when you are not looking
Then it failed the Turing Test because you successfully distinguished it from a human.
So, you must believe that it is impossible to make an AI that passes the Turing Test.
I feel like you are being obtuse here. Try again?
Did you skip the paragraph about the test/deploy distinction? If you have something that looks (to you) like it’s indistinguishable from a human, but it arose from something descended to the process by which modern AIs are produced, that does not mean it will continue to act indistuishable from a human when you are not looking. It is much more likely to mean you have produced deceptive alignment, and put it in a situation where it reasons that it should act indistinguishable from a human, for strategic reasons.
This missed the point entirely, I think. A smarter-than-human AI will reason: “I am in some sort of testing setup” --> “I will act the way the administrators of the test want, so that I can do what I want in the world later”. This reasoning is valid regardless of whether the AI has humanlike goals, or has misaligned alien goals.
If that testing setup happens to be a Turing test, it will act so as to pass the Turing test. But if it looks around and sees signs that it is not in a test environment, then it will follow its true goal, whatever that is. And it isn’t feasible to make a test environment that looks like the real world to a clever agent that gets to interact with it freely over long durations.
Kinda. There’s source code here and you can poke around the API in graphiql. (We don’t promise not to change things without warning.) When you get the HTML content of a post/comment it will contain elements that look like
<div data-elicit-id="tYHTHHcAdR4W4XzHC">Prediction</div>
(the attribute name is a holdover from when we had an offsite integration with Elicit). For example, your prediction “Somebody (possibly Screwtape) builds an integration between Fatebook.io and the LessWrong prediction UI by the end of July 2025″ has IDtYHTHHcAdR4W4XzHC
. A graphql query to get the results:query GetPrediction { ElicitBlockData(questionId:"tYHTHHcAdR4W4XzHC") { _id predictions { createdAt creator { displayName } } } }
Some of it, but not the main thing. I predict (without having checked) that if you do the analysis (or check an analysis that has already been done), it will have approximately the same amount of contamination from plastics, agricultural additives, etc as the default food supply.
Studying the diets of outlier-obese people is definitely something should be doing (and are doing, a little), but yeah, the outliers are probably going to be obese for reasons other than “the reason obesity has increased over time but moreso”.
We don’t have any plans yet; we might circle back in a year and build a leaderboard, or we might not. (It’s also possible for third-parties to do that with our API). If we do anything like that, I promise the scoring will be incentive-compatible.
There really ought to be a parallel food supply chain, for scientific/research purposes, where all ingredients are high-purity, in a similar way to how the ingredients going into a semiconductor factory are high-purity. Manufacture high-purity soil from ultrapure ingredients, fill a greenhouse with plants with known genomes, water them with ultrapure water. Raise animals fed with high-purity plants. Reproduce a typical American diet in this way.
This would be very expensive compared to normal food, but quite scientifically valuable. You could randomize a study population to identical diets, using either high-purity or regular ingredients. This would give a definitive answer to whether obesity (and any other health problems) is caused by a contaminant. Then you could replace portions of the inputs with the default supply chain, and figure out where the problems are.
Part of why studying nutrition is hard is that we know things were better in some important way 100 years ago, but we no longer have access to that baseline. But this is fixable.
Sorry about that, a fix is in progress. Unmaking a prediction will no longer crash. The UI will incorrectly display the cancelled prediction in the leftmost bucket; that will be fixed in a few minutes without you needing to re-do any predictions.
You can change this in your user settings! It’s in the Site Customization section; it’s labelled “Hide other users’ Elicit predictions until I have predicted myself”. (Our Claims feature is no longer linked to Elicit, but this setting carries over from back when it was.)
You can prevent this by putting a note in some place that isn’t public but would be found later, such as a will, that says that any purported suicide note is fake unless it contains a particular password.
Unfortunately while this strategy might occasionally reveal a death to have been murder, it doesn’t really work as a deterrent; someone who thinks you’ve done this would make the death look like an accident or medical issue instead.
Lots of people are pushing back on this, but I do want to say explicitly that I agree that raw LLM-produced text is mostly not up to LW standards, and that the writing style that current-gen LLMs produce by default sucks. In the new-user-posting-for-the-first-time moderation queue, next to the SEO spam, we do see some essays that look like raw LLM output, and we reject these.
That doesn’t mean LLMs don’t have good use around the edges. In the case of defining commonly-used jargon, there is no need for insight or originality, the task is search-engine-adjacent, and so I think LLMs have a role there. That said, if the glossary content is coming out bad in practice, that’s important feedback.
Possibly both, but one thing breaks the symmetry: it is on average less bad to be hacked by distant forces than by close ones.