Born too late to explore Earth; born too early to explore the galaxy; born just the right time to save humanity.
Ulisse Mini
You’re probably in lesswrong docs mode, either switch to markdown or press ctrl+k to insert link around selected text
Here’s a collection of links to recent “mind-reading” related results using deep learning. Comment ones I missed!
PS: It seems like a good idea for alignment people (like Steve) to have the capacity to run novel human-brain experiments like these. If we don’t currently have this capacity, well… Free dignity points to be gained :)
[ASoT] Reflectivity in Narrow AI
I see, thanks!
If you move a small distance in parallel to the manifold, then your distance from the manifold goes as
This doesn’t make sense to me, why is this true?
My alternate intuition is that the directional derivative (which is a dot product between the gradient and ds) along the manifold is zero because we aren’t changing our behavior on the training set.
Characterizing Intrinsic Compositionality in Transformers with Tree Projections
I get the impression that you picture something more narrow with my comment than I intended? I don’t think my comment is limited to Bayesian rationality; we could also consider non-Bayesian reasoning approaches like logic or frequentism or similar. Even CNNs or transformers would go under what I was talking about. Or Aumann’s agreement theorem, or lots of other things.
I agree those all count, but those all (mostly) have Bayesian interpretations which is what I was referring to.
More saliently though, whatever mechanism you implement to potentially “release” the AGI into simulated universes could be gamed or hacked by the AGI itself.
I think this is fixable, game of life isn’t that complicated, you could prove correctness somehow.
Heck, this might not even be necessary—if all they’re getting are simulated universes, then they could probably create those themselves since they’re running on arbitrarily large compute anyway.
This is a great point, I forgot AIXI also had unbounded compute, why would it want to escape and get more!
I don’t think AIXI can “care” about universes it simulates itself, probably because of the cartesian boundary (non-embeddedness) meaning the utility function is defined on inputs (which AIXI doesn’t control). but I’m not sure. I don’t understand AIXI well.
You’re also making the assumption that these AIs would care about what happens inside a simulation created in the future, as something to guide their current actions. This may be true of some AI systems, but feels like a pretty strong one to hold universally.
The simulation being “created in the future” doesn’t seem to matter to me. You could also already be simulating the two universes and the game decides if the AIs gain access to them.
(I think this is a pretty cool post, by the way, and appreciate more ASoT content).
Thanks! Will do
I unrealistically assumed that I got to pick the environment, i.e. it was “solve this problem for some environment” whereas in reality it’s “solve this problem for every environment in some class of natural environments” or something. This is a good part of how I’m assuming my way out of reality :)
I don’t call this instrumental convergence (of goals), more like the Bayesian bowl all intelligent agents fall towards. I also think the instrumental convergence of goals is stronger/more certain than the convergence to approximately Bayesian reasoners.
[ASoT] Instrumental convergence is useful
[ASoT] Thoughts on GPT-N
Random thought I had about this: IIRC the science of skill transfer between fields shows it doesn’t really happen except in people with a high degree of mastery. (Cite: Ultralearning or Peak mentions this I think?)
Might be something to look into for Refine, a master of X could be significantly better at transferring insights from X to Y.
Mmm, I think it matters a lot which of the 10B[1] values is harder to instill, I think most of the difficulty is in corrigibility. Strong corrigibility seems like it basically solves alignment. If this is the case then corrigibility is a great thing to aim for, since it’s the real “hard part” as opposed to random human values. I’m ranting now though… :L
- ↩︎
I think it’s way less than 10B, probably <1000 though I haven’t thought about this much and don’t know what you’re counting as one “value” (If you mean value shard maybe closer to 10B, if you mean human interpretable value I think <1000)
- ↩︎
In a recent post, John mentioned how Corrigability being a subset of Human Values means we should consider using Corrigability as an alignment target. This is a useful perspective, but I want to register that doesn’t always imply that doing is “easier” than , this is similar to the problems with problem factorization for alignment but even stronger! Even if we only want to solve and not , can still be harder!
For a few examples of this:
Acquiring half a Strawberry by itself is harder than a full strawberry (You have to get a full strawberry, then cut it in half) (This holds for X=MacBook, Person, Penny too)
Let be a lemma used in a proof of (meaning in some sense). It may be that can be immediately proved via a known more general theorem . In this case is harder to directly prove then .
When writing an essay, writing section 3 alone can be harder than writing the whole essay, because it interacts with the other parts, you learn from writing the previous parts, etc. (Sidenote: There’s a trivial sense[1] in which writing section 3 can be no harder than writing the whole essay, but in practice we don’t care as the whole point to considering a decomposition is to do better.)
In general, depending on how “natural” the subproblem in the factorization is, subproblems can be harder than solving the original problem. I believe this may (30%) be the case with corrigibility; mainly because (1) Corrigability is anti-natural in some ways, and (2) Humans are pretty good at human values while being not-that-corrigible.
- ^
Just write the whole thing then throw everything else away!
Is there any validity in this notion of Cross-Void Optimization?
This reminds me of Josh Waitzkin in The Art of Learning
This was an exciting time. As I internalized Tai Chi’s technical foundation, I began to see my chess understanding manifesting itself in the Push Hands game. I was intimate with competition, so offbeat strategic dynamics were in my blood. I would notice structural flaws in someone’s posture, just as I might pick apart a chess position, or I’d play with combinations in a manner people were not familiar with.
And
From the outside Tai Chi and chess couldn’t be more different, but they began to converge in my mind. I started to translate my chess ideas into Tai Chi language, as if the two arts were linked by an essential connecting ground. Every day I noticed more and more similarities, until I began to feel as if I were studying chess when I was studying Tai Chi. Once I was giving a forty-board simultaneous chess exhibition in Memphis and I realized halfway through that I had been playing all the games as Tai Chi. I wasn’t calculating with chess notation or thinking about opening variations…I was feeling flow, filling space left behind, riding waves like I do at sea or in martial arts. This was wild! I was winning chess games without playing chess.
Jujitsu seems cool by the way, I’ve been meaning to start it :D
I believe you’ve got a typo in the defn of
Shouldn’t it be ?
Also it appears like you can simplify
Into Not sure if this wasn’t done for some reason or it was a typo.
This is awesome, I hope I meet your kids sometime.
Relevant part of a documentary made by a homeschooler (who you could talk to on discord if you’re interested.) I’m also homeschooled.
There are a series of math books that give a wide overview of a lot of math. In the spirit of comprehensive information gathering, I’m going to try to spend my “fun math time” reading these.
I theorize this is a good way to build mathematical maturity, at least the “parse advanced math” part. I remember when I became mathematically mature enough to read Math Wikipedia, I want to go further in this direction till I can read math-y papers like Wikipedia.