Born too late to explore Earth; born too early to explore the galaxy; born just the right time to save humanity.
Ulisse Mini
That link isn’t working for me, can you send screenshots or something? When I try and load it I get an infinite loading screen.
Re(prompt ChatGPT): I’d already tried what you did and some (imo) better prompt engineering, and kept getting a character I thought was overly wordy/helpful (constantly asking me what it could do to help vs. just doing it). A better prompt engineer might be able to get something working though.
Can you give specific example/screenshots of prompts and outputs? I know you said reading the chat logs wouldn’t be the same as experiencing it in real time, but some specific claims like the prompt
The following is a conversation with Charlotte, an AGI designed to provide the ultimate GFE
Resulting in a conversation like that are highly implausible.[1] At a minimum you’d need to do some prompt engineering, and even with that, some of this is implausible with ChatGPT which typically acts very unnaturally after all the RLHF OAI did.
- ↩︎
Source: I tried it, and tried some basic prompt engineering & it still resulted in bad outputs
- ↩︎
Interesting I didn’t know the history, maybe I’m insufficiently pessimistic about these things. Consider my query retracted
Congratulations!
Linear algebra done right is great for gaining proof skills, though for the record I’ve read it and haven’t solved alignment yet. I think I need several more passes of linear algebra :)
Are most uncertainties we care about logical rather than informational? All empirical ML experiments are pure computations a Bayesian superintelligence could do in its head. How much of our uncertainty comes from computational limits in practice, versus actual information bottlenecks?
A trick to remember: the first letter of each virtue gives (in blocks): CRL EAES HP PSV, which can easily be remembered as “cooperative reinforcement learning, EAs, Harry Potter, PS: The last virtue is the void.”
(Obviously remembering these is pointless, but memorizing lists is a nice way to practice mnemonic technique.)
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.
EleutherAI’s
#alignment
channels are good to ask questions in. Some specific answersI understand that a reward maximiser would wire-head (take control over the reward provision mechanism), but I don’t see why training an RL agent would necessarily end up in a reward-maximising agent? Turntrout’s Reward is Not the Optimisation Target shed some clarity on this, but I definitely have remaining questions.
Leo Gao’s Toward Deconfusing Wireheading and Reward Maximization sheds some light on this.
How can I look at my children and not already be mourning their death from day 1?
Suppose you lived in the dark times, where children have a <50% of living to adulthood. Wouldn’t you still have kids? Even if probabilistically smallpox was likely to take them?
If AI kills us all, will my children suffer? Will it be my fault for having brought them into the world while knowing this would happen?
Even if they don’t live to adulthood, I’d still view their childhoods as valuable. Arguably higher average utility than adulthood.
Even if my children’s short lives are happy, wouldn’t their happiness be fundamentally false and devoid of meaning?
Our lifetimes are currently bounded, are they false and devoid of all meaning?
The negentropy in the universe is also bounded, is the universe false and devoid of all meaning?
Random thought: Perhaps you could carefully engineer gradient starvation in order to “avoid generalizing” and defeat the Discrete modes of prediction example. You’d only need to delay it until reflection, then the AI can solve the successor AI problem.
In general: hack our way towards getting value-preserving reflectivity before values drift from “Diamonds” → “What’s labeled as a diamond by humans”. (Replacing with “Telling the truth”, and “What the human thinks is true” respectively).
I disagree that the policy must be worth selling (see e.g. Jordon Belfort). Many salespeople can sell things that aren’t worth buying. See also: never split the difference for an example of negotiation when you have little/worse leverage.
(Also, I don’t think htwfaip boils down to satisfying an eager want, the other advice is super important too. E.g. don’t criticize, be genuinely interested in a person, …)
Both are important, but I disagree that power is always needed. In example 3,7,9 it isn’t clear that the compromise is actually better for the convinced party. The insurance is likely -EV, The peas aren’t actually a crux to defeating the bully, the child would likely be happier outside kindergarten.
From skimming the benchmark and the paper this seems overhyped (like Gato). roughly it looks like
May 2022: Deepmind releases a new benchmark for learning algorithms
...Nobody cares (according to google scholar citations)
Dec 2022: Deepmind releases a thing that beats the baselines on their benchmark
I don’t know much about GNNs & only did a surface-level skim so I’m interested to hear other takes.
Interesting perspective, kinda reminds me of the ROME paper where it seems to only do “shallow counterfactuals”.
unpopular opinion: I like the ending of the subsequent film
IMO it’s a natural continuation for Homura. After spending decades of subjective time trying to save someone would you really let them go like that? Homura isn’t an altruist, she doesn’t care about the lifetime of the universe—she just wants Madoka.
I was directing that towards lesswrongers reading my answer, not the general population.
I think school is huge in preventing people from becoming smart and curious. I spent 1-2years where I hardly studied at all and mostly played videogames—I wish I hadn’t wasted that time, but when I quit I did so of my own free will. I think there’s a huge difference between discipline imposed from the outside vs the inside, and getting to the latter is worth a lot. (though I wish I hadn’t wasted all that time now haha)
I’m unsure which parts of my upbringing were cruxes for unschooling working. You should probably read a book or something rather than taking my (very abnormal) opinion. I just know how it went for me :)
Epistemic status: personal experience.
I’m unschooled and think it’s clearly better, even if you factor in my parents being significantly above average in parenting. Optimistically school is babysitting, people learn nothing there while wasting most of their childhood. Pessimistically it’s actively harmful by teaching people to hate learning/build antibodies against education.
Here’s a good documentary made by someone who’s been in and out of school. I can’t give detailed criticism since I (thankfully) never had to go to school.
EDIT: As for what the alternative should be, I honestly don’t know. Shifting equilibria is hard, though it’s easy to give better examples (e.g. dath ilan, things in the documentary I linked.) For a personal solution: Homeschool your kids.
#3 is good. another good reason is so you have enough mathematical maturity to understand fancy theoretical results.
I’m probably overestimating the importance of #4, really I just like having the ability to pick up a random undergrad/early-grad math book and understand what’s going on, and I’d like to extend that further up the tree :)
Character.ai seems to have a lot more personality then ChatGPT. I feel bad for not thanking you earlier (as I was in disbelief), but everything here is valuable safety information. Thank you for sharing, despite potential embarrassment :)