I lurk and tag stuff.
Multicore
I think ALWs are already more of a “realist” cause than a doomer cause. To doomers, they’re a distraction—a superintelligence can kill you with or without them.
ALWs also seem to be held to an unrealistic standard compared to existing weapons. With present-day technology, they’ll probably hit the wrong target more often than human-piloted drones. But will they hit the wrong target more often than landmines, cluster munitions, and over-the-horizon unguided artillery barrages, all of which are being used in Ukraine right now?
The Huggingface deep RL course came out last year. It includes theory sections, algorithm implementation exercises, and sections on various RL libraries that are out there. I went through it as it came out, and I found it helpful. https://huggingface.co/learn/deep-rl-course/unit0/introduction
FYI all the links to images hosted on your blog are broken in the LW version.
You are right that by default prediction markets do not generate money, and this can mean traders have little incentive to trade.
Sometimes this doesn’t even matter. Sports betting is very popular even though it’s usually negative sum.
Otherwise, trading could be stimulated by having someone who wants to know the answer to a question provide a subsidy to the market on that question, effectively paying traders to reveal their information. The subsidy can take the form of a bot that bets at suboptimal prices, or a cash prize for the best performing trader, or many other things.
Alternately, there could be traders who want shares of YES or NO in a market as a hedge against that outcome negatively affecting their life or business, who will buy even if the EV is negative, and other traders can make money off them.
What are these AIs going to do that is immensely useful but not at all dangerous? A lot of useful capabilities that people want are adjacent to danger. Tool AIs Want to be Agent AIs.
If two of your AIs would be dangerous when combined, clearly you can’t make them publicly available, or someone would combine them. If your publicly-available AI is dangerous if someone wraps it with a shell script, someone will create that shell script (see AutoGPT). If no one but a select few can use your AI, that limits its usefulness.
An AI ban that stops dangerous AI might be possible. An AI ban that allows development of extremely powerful systems but has exactly the right safeguard requirements to render those systems non-dangerous seems impossible.
When people calculate utility they often use exponential discounting over time. If for example your discount factor is .99 per year, it means that getting something in one year is only 99% as good as getting it now, getting it in two years is only 99% as good as getting it in one year, etc. Getting it in 100 years would be discounted to .99^100~=36% of the value of getting it now.
The sharp left turn is not some crazy theoretical construct that comes out of strange math. It is the logical and correct strategy of a wide variety of entities, and also we see it all the time.
I think you mean Treacherous Turn, not Sharp Left Turn.
Sharp Left Turn isn’t a strategy, it’s just an AI that’s aligned in some training domains being capable but not aligned in new ones.
This post is tagged with some wiki-only tags. (If you click through to the tag page, you won’t see a list of posts.) Usually it’s not even possible to apply those. Is there an exception for when creating a post?
See https://www.lesswrong.com/posts/8gqrbnW758qjHFTrH/security-mindset-and-ordinary-paranoia and https://www.lesswrong.com/posts/cpdsMuAHSWhWnKdog/security-mindset-and-the-logistic-success-curve for Yudkowsky’s longform explanation of the metaphor.
Based on my incomplete understanding of transformers:
A transformer does its computation on the entire sequence of tokens at once, and ends up predicting the next token for each token in the sequence.
At each layer, the attention mechanism gives the stream for each token the ability to look at the previous layer’s output for other token before it in the sequence.
The stream for each token doesn’t know if it’s the last in the sequence (and thus that its next-token prediction is the “main” prediction), or anything about the tokens that come after it.
So each token’s stream has two tasks in training: predict the next token, and generate the information that later tokens will use to predict their next tokens.
That information could take many different forms, but in some cases it could look like a “plan” (a prediction about the large-scale structure of the piece of writing that begins with the observed sequence so far from this token-stream’s point of view).
In the blackmail scenario, FDT refuses to pay if the blackmailer is a perfect predictor and the FDT agent is perfectly certain of that, and perfectly certain that the stated rules of the game will be followed exactly. However, with stakes of $1M against $1K, FDT might pay if the blackmailer had an 0.1% chance of guessing the agent’s action incorrectly, or if the agent was less than 99.9% confident that the blackmailer was a perfect predictor.
(If the agent is concerned that predictably giving in to blackmail by imperfect predictors makes it exploitable, it can use a mixed strategy that refuses to pay just often enough that the blackmailer doesn’t make any money in expectation.)
In Newcomb’s Problem, the predictor doesn’t have to be perfect—you should still one-box if the predictor is 99.9% or 95% or even 55% likely to predict your action correctly. But this scenario is extremely dependent on how many nines of accuracy the predictor has. This makes it less relevant to real life, where you might run into a 55% accurate predictor or a 90% accurate predictor, but never a perfect predictor.
I’m not familiar with LeCun’s ideas, but I don’t think the idea of having an actor, critic, and world model is new in this paper. For a while, most RL algorithms have used an actor-critic architecture, including OpenAI’s old favorite PPO. Model-based RL has been around for years as well, so probably plenty of projects have used an actor, critic, and world model.
Even though the core idea isn’t novel, this paper getting good results might indicate that model-based RL is making more progress than expected, so if LeCun predicted that the future would look more like model-based RL, maybe he gets points for that.
This tag was originally prompted by this exchange: https://www.lesswrong.com/posts/qCc7tm29Guhz6mtf7/the-lesswrong-2021-review-intellectual-circle-expansion?commentId=CafTJyGL5cjrgSExF
Merge candidate with Philosophy of Language?
Things that probably actually fit into your interests:
A Sensible Introduction to Category Theory
Most of what 3blue1brown does
Videos that I found intellectually engaging but are far outside of the subjects that you listed:
Cursed Problems in Game Design
Disney’s FastPass: A Complicated History
Building a 6502-based computer from scratch (playlist)
(I am also a jan Misali fan)
The preview-on-hover for those manifold links shows a 404 error. Not sure if this is Manifold’s fault or LW’s fault.
One antifeature I see promoted a lot is “It doesn’t track your data”. And this seems like it actually manages to be the main selling point on its own for products like DuckDuckGo, Firefox, and PinePhone.
The major difference from the game and movie examples is that these products have fewer competitors, with few or none sharing this particular antifeature.
Antifeatures work as marketing if a product is unique or almost unique in its category for having a highly desired antifeature. If there are lots of other products with the same antifeature, the antifeature alone won’t sell the product. But the same is true of regular features. You can’t convince your friends to play a game by saying “it has a story” or “it has a combat system” either.
On the first read I was annoyed at the post for criticizing futurists for being too certain in their predictions, while it also throws out and refuses to grade any prediction that expressed uncertainty, on the grounds that saying something “may” happen is unfalsifiable.
On reflection these two things seem mostly unrelated, and for the purpose of establishing a track record “may” predictions do seem strictly worse than either predicting confidently (which allows scoring % of predictions right), or predicting with a probability (which none of these futurists did, but allows creating a calibration curve).
With such a vague and broad definition of power fantasy, I decided to brainstorm a list of ways games can fail to be a power fantasy.
Mastery feels unachievable.
It seems like too much effort. Cliff-shaped learning curves, thousand-hour grinds, old PvP games where every player still around will stomp a noob like you flat.
The game feels unfair. Excessive RNG, “Fake Difficulty” or “pay to win”.
The power feels unreal, success too cheaply earned.
The game blatantly cheats in your favor even when you didn’t need it to.
Poor game balance leading to hours of trivially easy content that you have to get through to reach the good stuff.
Mastery doesn’t feel worth trying for.
Games where the gameplay isn’t fun and there’s no narrative or metagame hook making you want to do it.
The Diablo 3 real money auction house showing you that your hard-earned loot is worth pennies.
There is no mastery to try for in the first place.
Walking simulators, visual novels, etc. Walking simulators got a mention in the linked article. They aren’t really “failing” at power fantasy, just trying to do something different.