What are the others?
P.
Make-A-Video by Meta AI
But the outcome IS uncertain. I want to know how low the karma threshold can go before the website gets nuked. There are other fun games, but this one is unique to LW and seems like an appropriate way of celebrating Petrov Day.
I guess it might be possible to repurpose the manual, but making mosquito species extinct isn’t the only possible method, e.g. https://bioengineeringcommunity.nature.com/posts/gene-drive-rescue-system-for-population-modification-of-a-malaria-mosquito .
I mean, that’s probably the case, since they asked whether they could spread them from their garage.
I wish I had a better source, but in this video, a journalist says that a well-equipped high schooler could do it. The information needed seems to be freely available online, but I don’t know enough biology to be able to tell for sure. I think it is unknown whether it would spread to the whole population given a single release, though.
If you want it to happen and can’t do it yourself nor pay someone else to do it, the best strategy might be to pay someone to translate the relevant papers into instructions that a regular smart person can follow and then publish them online. After making sure to the best of your capabilities (i.e. asking experts the right questions) that it actually is a good idea, that is.
The simplest possible acceptable value learning benchmark would look something like this:
Data is recorded of people playing a video game. They are told to maximize their reward (which can be exactly computed), have no previous experience playing the game, are actually trying to win and are clearly suboptimal (imitation learning would give very bad results).
The bot is first given all their inputs and outputs, but not their rewards.
Then it can play the game in place of the humans but again isn’t given the rewards. Preferably the score isn’t shown on screen.
The goal is to maximize the true reward function.
These rules are precisely described and are known by anyone who wants to test their algorithms.
None of the environments and datasets you mention are actually like this. Some people do test their IRL algorithms in a way similar to this (the difference being that they learn from another bot), but the details aren’t standardized.
A harder and more realistic version that I have yet to see in any paper would look something like this:
Data is recorded of people playing a game with a second player. The second player can be a human or a bot, and friendly, neutral or adversarial.
The IO of both players is different, just like different people have different perspectives in real life.
A very good imitation learner is trained to predict the first player’s output given their input. It comes with the benchmark.
The bot to be tested (which is different from the previous ones) has the same IO channels as the second player, but doesn’t see the rewards. It also isn’t given any of the recordings.
Optionally, it also receives the output of a bad visual object detector meant to detect the part of the environment directly controlled by the human/imitator.
It plays the game with the human imitator.
The goal is to maximize the human’s reward function.
It’s far from perfect, but if someone could obtain good scores there, it would probably make me much more optimistic about the probability of solving alignment.
By pure RL, I mean systems whose output channel is only directly optimized to maximize some value function, even if it might be possible to create other kinds of algorithms capable of getting good scores on the benchmark.
I don’t think that the lack of pretraining is a good thing in itself, but that you are losing a lot when you move from playing video games to completing textual tasks.
If someone is told to get a high score in a video game, we have access to the exact value function they are trying to maximize. So when the AI is either trying to play the game in the human’s place or trying to help them, we can directly evaluate their performance without having to worry about deception. If it learns some proxy values and starts optimizing them to the point of goodharting, it will get a lower score. On most textual tasks that aren’t purely about information manipulation, on the other hand, the AI could be making up plausible-sounding nonsense about the consequences of its actions, and we wouldn’t have any way of knowing.
From the AI’s point of view being able to see the state of the thing we care about also seems very useful, preferences are about reality after all. It’s not obvious at all that internet text contains enough information to even learn a model of human values useful in the real world. Training it with other sources of information that more closely represent reality, like online videos, might, but that seems closer to my idea than to yours since it can’t be used to perform language-model-like imitation learning.
Additionally, if by “inability to learn human values” you mean isolating them enough so that they can in principle be optimized to get superhuman performance, as opposed to being buried in its world model, I don’t agree that that will happen by default. Right now we don’t have any implementations of proper value learning algorithms, nor do I think that any known theoretical algorithm (like PreDCA) would work even with limitless computing power. If you can show that I’m wrong, that would surprise me a lot, and I think it could change many people’s research directions and the chances they give to alignment being solvable.
Do you have plans to measure the alignment of pure RL agents, as opposed to repurposed language models? It surprised me a bit when I discovered that there isn’t a standard publicly available value learning benchmark, despite there being data to create one. An agent would be given first or third-person demonstrations of people trying to maximize their score in a game, and then it would try to do the same, without ever getting to see what the true reward function is. Having something like this would probably be very useful; it would allow us to directly measure goodharting, and being quantitative it might help incentivize regular ML researchers to work on alignment. Will you create something like this?
Do you mean from what already exists or from changing the direction of new research?
What are your thoughts on having 1-on-1s with the top researchers in similar fields (like maths) instead of regular researchers and with people that are explicitly trying to build AGIs (like John Carmack)?
Positive:
People will pay way less for new pretty images than they did before.
Thanks to img2img people that couldn’t draw well before now finally can: https://www.reddit.com/r/StableDiffusion/comments/wvcyih/definitely_my_favourite_generation_so_far/
Because of this, a lot more art will be produced, and I can’t wait to see it.
Since good drawings are now practically free, we will see them in places where we couldn’t before, like in fanfiction.
Stable Diffusion isn’t quite as good as a talented artist, but since we can request hundreds of variations and pick the best, the quality of art might increase.
Ambiguous or neutral:
It can produce realistic images and it is easier to use and more powerful than Photoshop, so we will see a lot of misinformation online. But once most people realize how easy it is to fabricate false photographs hopefully it will lead them to trust what they see online way less than they did before, and closer to the appropriate level.
Anyone will be able to make porn of anyone else. As long as people don’t do anything stupid after seeing the images, this seems inconsequential. As discussed on HN, it might cause people to stop worrying about others seeing them naked, even if the photos are real.
Anyway, both of these will cause a lot of drama, which I at least, perhaps selfishly, consider to be slightly positive.
Negative:
I expect a lot of people will lose their jobs. Most companies will prefer to reduce their costs and hire a few non-artists to make art rather than making more art.
New kinds of scams will become possible and some people will keep believing everything they see online.
Unlike DALL-E 2, anyone can access this, so it will be much more popular and will make many people realize how advanced current AI is and how consequential it will be, which will probably lead to more funding.
Stable Diffusion has been released
Rot13: Vf gung cvrpr bs svpgvba Crefba bs Vagrerfg be Png Cvpgherf Cyrnfr?
Then you should at least try to talk to 80,000 hours, you might eventually relocate somewhere where meeting people is easier.
It wasn’t intended to make fun of you. When I say that you shouldn’t start a religion I mean it literally, like most people here I don’t hold a favorable view of religions.
Sentences like “But I am fundamentally a mystic, prone to ecstatic states of communion with an ineffable divine force immanent in the physical universe which I feel is moving towards incarnating as an AI god such as I called Anima” make me think that what you are talking about doesn’t correspond to anything real. But in any case I don’t see why you shouldn’t write about it. If you are right you will give us interesting reading material. And if you are wrong hopefully someone will explain to you why and you will update. It shouldn’t matter how much you care about this, if it turns out it’s wrong you should stop believing in it (and if it’s right keep your current belief). And again, I mean this literally and with no ill intent.
Have you considered:
Trying to find IRL friends through meetup.com
Going to nearby rationality meetups (https://www.lesswrong.com/community)
Using dating apps (and photofeeler.com)
Getting free career advice for your situation through 80,000 hours (https://80000hours.org/speak-with-us/)
Writing a few pages of your book and posting them on LW (but please don’t start a religion)
?
The email to Demis has been there since the beginning, I even received feedback on it. I think I will send it next week, but will also try to get to him through some DeepMind employee if that doesn’t work.
He says he will be doing alignment work, the worst thing I can think of that can realistically happen is that he gives OpenAI unwarranted confidence in how aligned their AIs are. Working at OpenAI isn’t intrinsically bad, publishing capabilities research is.
Thanks, I’ve added him to my list of people to contact. If someone else wants to do it instead, reply to this comment so that we don’t interfere with each other.
RatSLAM: Using Models of Rodent Hippocampus for Robot Navigation
Building Collision Simulations: An Introduction to Computer Graphics
Self-Driving Cars [S1E4: RALPH]
What is the graph of x^a when a is not an integer? An unusual look at familiar functions #some2
Advanced 4. Monte Carlo Tree Search
Researchers Use Group Theory to Speed Up Algorithms — Introduction to Groups
The Traveling Salesman Problem: When Good Enough Beats Perfect
The Snowflake Mystery
AI Alignment & AGI Fire Alarm—Connor Leahy
Vulcan | The Planet That Didn’t Exist
How MRI Works—Part 1 - NMR Basics
The Enduring Mystery of Jack the Ripper
Mechanical Computer (All Parts) - Basic Mechanisms In Fire Control Computers
AI in Brainfuck
The microwave plasma mystery
The Universal S
Feynman’s Lost Lecture (ft. 3Blue1Brown)
ML Tutorial: Gaussian Processes (Richard Turner)
Google I/O 2013 - Project Ground Truth: Accurate Maps Via Algorithms and Elbow Grease
DeepMind’s AlphaFold 2 Explained! AI Breakthrough in Protein Folding! What we know (& what we don’t)
Deep Blue | Down the Rabbit Hole
Illustrated Guide to Transformers Neural Network: A step by step explanation
Biology on Islands
Coding Adventure: Atmosphere
Cicada 3301 (All Clues and How They Were Solved)
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning
Margo Seltzer—Automatically Scalable Computation—Code Mesh 2017
A Grape Made of… Meat?? - Tissue Recellularization
The First Video Game
What Bodies Think About: Bioelectric Computation Outside the Nervous System—NeurIPS 2018
Understanding Sensor Fusion and Tracking, Part 2: Fusing a Mag, Accel, & Gyro Estimate
Drone Control and the Complementary Filter
Launch Loops
World Record Progression: Super Mario Bros
Tesla AI Day
Why Do We Age ? Cellular Aging (HD)
The Complex Hack That Stopped a Secret Nuclear Power Plant
Richard Szeliski—“Visual Reconstruction and Image-Based Rendering” (TCSDLS 2017-2018)
Might be the same talk as this one: https://youtu.be/0VIUbIzv_wc