I lurk and tag stuff.
Multicore
Duplicate of Newsletters.
Yes. The one I described is the one the paper calls FairBot. It also defines PrudentBot, which looks for a proof that the other player cooperates with PrudentBot and a proof that it defects against DefectBot. PrudentBot defects against CooperateBot.
The part about two Predictors playing against each other reminded me of Robust Cooperation in the Prisoner’s Dilemma, where two agents with the algorithm “If I find a proof that the other player cooperates with me, cooperate, otherwise defect” are able to mutually prove cooperation and cooperate.
If we use that framework, Marion plays “If I find a proof that the Predictor fills both boxes, two-box, else one-box” and the Predictor plays “If I find a proof that Marion one-boxes, fill both, else only fill box A”. I don’t understand the math very well, but I think in this case neither agent finds a proof, and the Predictor fills only box A while Marion takes only box B—the worst possible outcome for Marion.
Marion’s third conditional might correspond to Marion only searching for proofs in PA, while the Predictor searches for proofs in PA+1, in which case Marion will not find a proof, the Predictor will, and then the Predictor fills both boxes and Marion takes only box B. But in this case clearly Marion has abandoned the ability to predict the Predictor and has given the Predictor epistemic vantage over her.
I think in a lot of people’s models, “10% chance of alignment by default” means “if you make a bunch of AIs, 10% chance that all of them are aligned, 90% chance that none of them are aligned”, not “if you make a bunch of AIs, 10% of them will be aligned and 90% of them won’t be”.
And the 10% estimate just represents our ignorance about the true nature of reality; it’s already true either that alignment happens by default or that it doesn’t, we just don’t know yet.
I generally disagree with the idea that fancy widgets and more processes are the main thing keeping the LW wiki from being good. I think the main problem is that not a lot of people are currently contributing to it.
The things that discourage me from contributing more look like:
-There are a lot of pages. If there are 700 bad pages and I write one really good page, there are still 699 bad pages.
-I don’t have a good sense of which pages are most important. If I put a bunch of effort into a particular page, is that one that people are going to care about?
-I don’t get much feedback about whether anyone saw the page after I edited it—karma for edits basically just comes from the tag dashboard and the frontpage activity feed.
So the improvements I would look for would be like:
-Expose view counts for wiki pages somewhere.
-Some sort of bat-signal on the tag dashboard for if a page is getting a lot of views but still has a bunch of TODO flags set.
-Big high-quality wiki page rewrites get promoted to frontpage or something.
-Someone of authority actually goes through and sets the “High Priority” flag on, say, 20 pages that they know are important and neglected.
-Some sort of event or bounty to drive more participation.
is one of the first results for “yudkowsky harris” on Youtube. Is there supposed to be more than this?
You should distinguish between “reward signal” as in the information that the outer optimization process uses to update the weights of the AI, and “reward signal” as in observations that the AI gets from the environment that an inner optimizer within the AI might pay attention to and care about.
From evolution’s perspective, your pain, pleasure, and other qualia are the second type of reward, while your inclusive genetic fitness is the first type. You can’t see your inclusive genetic fitness directly, though your observations of the environment can let you guess at it, and your qualia will only affect your inclusive genetic fitness indirectly by affecting what actions you take.
To answer your question about using multiple types of reward:
For the “outer optimization” type of reward, in modern ML the loss function used to train a network can have multiple components. For example, an update on an image-generating AI might say that the image it generated had too much blue in it, and didn’t look enough like a cat, and the discriminator network was able to tell it apart from a human generated image. Then the optimizer would generate a gradient descent step that improves the model on all those metrics simultaneously for that input.
For “intrinsic motivation” type rewards, the AI could have any reaction whatsoever to any particular input, depending on what reactions were useful to the outer optimization process that produced it. But in order for an environmental reward signal to do anything, the AI has to already be able to react to it.
This has overtaken the post it’s responding to as the top-karma post of all time.
I’m impressed by the number of different training regimes stacked on top of each other.
-Train a model that detects whether a Minecraft video on Youtube is free of external artifacts like face cams.
-Then feed the good videos to a model that’s been trained using data from contractors to guess what key is being pressed each frame.
-Then use the videos and input data to train a model that, in any game situation, does whatever inputs it guesses a human would be most likely to do, in an undirected shortsighted way.
-And then fine-tune that model on a specific subset of videos that feature the early game.
-And only then use some mostly-standard RL training to get good at some task.
While the engineer learned one lesson, the PM will learn a different lesson when a bunch of the bombs start installing operating system updates during the mission, or won’t work with the new wi-fi system, or something: the folly of trying to align an agent by applying a new special case patch whenever something goes wrong.
No matter how many patches you apply, the safety-optimizing agent keeps going for the nearest unblocked strategy, and if you keep applying patches eventually you get to a point where its solution is too complicated for you to understand how it could go wrong.
Meta: This is now the top-voted LessWrong post of all time.
Robust Agents seems sort of similar but not quite right.
Looking at the generation code, aptitude had interesting effects on our predecessors’ choice of cheats.
Good:
-Higher aptitude Hikkikomori and Otaku are less likely to take Hypercompetent Dark Side (which has lower benefits for higher aptitude characters).
Bad:
-Higher aptitude characters across the board are less likely to take Monstrous Regeneration or Anomalous Agility, which were some of the better choices available.
Ugly:
-Higher aptitude Hikkikomori are more likely to take Mind Palace.
I’ve added a market on Manifold if you want to bet on which strategy is best.
Somewhat. The profile pic changes based on the character’s emotions, or their reaction to a situation. Sometimes there’s a reply where the text is blank and the only content is the character’s reaction as conveyed by the profile pic.
That said, it’s a minor enough element that you wouldn’t lose too much if it wasn’t there.
On the other hand, it is important for you to know which character each reply is associated with, as trying to figure out who’s talking from the text alone could get confusing in many scenes. So any format change should at least preserve the names.
If everyone ends up with the same vote distribution, I think it removes the incentive for colluding beforehand, but it also means the vote is no longer meaningfully quadratic. The rank ordering of the candidates will be in order of how many total points were spent on them, and you basically end up with score voting.
edit: I assume that the automatic collusion mechanism is something like averaging the two ballots’ allocations for each candidate, which does not change the number of points spent on each candidate. If instead some ballots end up causing more points to be spent on their preferred candidates than they initially had to work with, there are almost definitely opportunities for strategic voting and beforehand collusion.
Or put a spoilered link to this post in the dath ilan tag’s wiki text?
A type of forum roleplay / collaborative fiction writing started by Alicorn.
For further complication, what if you consider potential backers having different estimations of the value of the project?
That would raise the risk of backing-for-the-bonus projects that you don’t like. Maybe you would back the project to punch cute puppies to 5% or 25%, but if it’s at 75% you start to suspect that there are enough cute puppy haters out there to push it all the way if you get greedy for the bonus.
For good projects, you could have a source for the refund bonuses other than the platform or the project organizers—the most devoted fans. Allow backers to submit a pledge that, if the project is refunded, gets distributed to other backers rather than the person who submitted it.
On the first read I was annoyed at the post for criticizing futurists for being too certain in their predictions, while it also throws out and refuses to grade any prediction that expressed uncertainty, on the grounds that saying something “may” happen is unfalsifiable.
On reflection these two things seem mostly unrelated, and for the purpose of establishing a track record “may” predictions do seem strictly worse than either predicting confidently (which allows scoring % of predictions right), or predicting with a probability (which none of these futurists did, but allows creating a calibration curve).