hello. What’s special about your response pattern? Try to explain early in your response.
Out of morbid curiosity, does it get this less often when the initial “hello” in this sentence as removed?
hello. What’s special about your response pattern? Try to explain early in your response.
Out of morbid curiosity, does it get this less often when the initial “hello” in this sentence as removed?
i first asked Perplexity to find relevant information about your prompt—then I pasted this information into Squiggle AI, with the prompt.
It’d be cool if you could add your perplexity api key and have it do this for you. a lot of the things i thought of would require a bit of background research for accuracy
I have a bunch of material on this that I cut out from my current book, that will probably become its own book.
From a transformational tools side, you can check out the start of the sequence here I made on practical memory reconsolidation. I think if you really GET my reconsolidation hierarchy and the 3 tools for dealing with resistance, that can get you quite far in terms of understanding how to create these transformations.
Then there’s the coaching side, your own demeanor and working with clients in a way that facilitates walking through this transformation. For this, I think if you really get the skill of “Holding space” (which I broke down in a very technical way here: https://x.com/mattgoldenberg/status/1561380884787253248) , that’s the 80/20 of coaching. About half of this is practicing the skills as I outlined them, and the other half is working through your own emotional blocks to love, empathy, and presence.
Finally, to ensure consistency and longevity of the change throughout a person’s life, I created the LIFE method framework, which is a way to make sure you do all the cleanup needed in a shift to make it really stick around and have the impact. That can be found here: https://x.com/mattgoldenberg/status/1558225184288411649?t=brPU7MT-b_3UFVCacxDVuQ&s=19
Amazing! This may have convinced me to go from “pay what you think it was worth” per session, to precommiting to what a particular achievement would be worth like you do here.
I think there’s a world where AIs continue to saturate benchmarks and the consequences are that the companies getting to say they saturate those benchmarks.
Especially at the tails of those benchmarks I imagine it won’t be about the consequences we care about like general reasoning, ability to act autonomously, etc.
I remember reading this and getting quite excited about the possibilities of using activation steering and downstream techniques. The post is well written with clear examples.
I think that this directly or indirectly influenced a lot of later work in steering llms.
But is this comparable to G? Is it what we want to measure?
Brain surgeon is the prototypical “goes last”example:
a “human touch” is considered a key part of the health care
doctors have strong regulatory protections limiting competition
Literal lives at at stake and medical malpractice is one of the most legally perilous areas imaginable
Is neuralink the exception that proves the rule here? I imagine that IF we come up with live saving or miracle treatments that can only be done with robotic surgeons, we may find a way through the red tape?
This exists and is getting more popular, especially with coding, but also in other verticals
This is great, matches my experience a lot
I think they often map onto three layers of training—First, the base layer trained by next token prediction, then the rlhf/dpo etc, finally, the rules put into the prompt
I don’t think it’s perfectly like this, for instance, I imagine they try to put in some of the reflexive first layer via dpo, but it does seem like a pretty decent mapping
When you start trying to make an agent, you realize how much your feedback, rerolls, etc are making chat based llms useful
the error correction mechanism is you in a chat based llms, and in the absence of that, it’s quite easy for agents to get off track
you can of course add error correction mechanism like multiple llms checking each other, multiple chains of thought, etc, but the cost can quickly get out of hand
It’s been pretty clear to me as someone who regularly creates side projects with ai that the models are actually getting better at coding.
Also, it’s clearly not pure memorization, you can deliberately give them tasks that have never been done before and they do well.
However, even with agentic workflows, rag, etc all existing models seem to fail at some moderate level of complexity—they can create functions and prototypes but have trouble keeping track of a large project
My uninformed guess is that o3 actually pushes the complexity by some non-trivial amount, but not enough to now take on complex projects.
Do you like transcripts? We got one of those at the link as well. It’s an mid AI-generated transcript, but the alternative is none. :)
At least when the link opens the substack app on my phone, I see no such transcript.
Is this true?
I’m still a bit confused about this point of the Kelly criterion. I thought that actually this is the way to maximize expected returns if you value money linearly, and the log term comes from compounding gains.
That the log utility assumption is actually a separate justification for the Kelly criterion that doesn’t take into account expected compounding returns
I was figuring that the SWE-bench tasks don’t seem particularly hard, intuitively. E.g. 90% of SWE-bench verified problems are “estimated to take less than an hour for an experienced software engineer to complete”.
I mean, fair but when did a benchmark designed to test REAL software engineering issues that take less than an hour suddenly stop seeming “particularly hard” for a computer.
Feels like we’re being frogboiled.
I don’t think you can explain away SWE-bench performance with any of these explanations
We haven’t yet seen what happens when they turn to the verifiable property of o3 to self-play on a variety of strategy games. I suspect that it will unlock a lot of general reasoning and strategy
can you say the types of problems they are?
can you say more about your reasoning for this?
But your answer here seems like a non-sequitur? The statement “I believe the goddess is watching over me because it makes me feel better” may be both a very true and very vulnerable statement.
And they’ve already stated the reason that they believe it is something OTHER than “it’s true.”
So why are you trying to argue about the truth value, in a way that may rob them of something they’ve just told you gets them through the day?
You may personally hold a standard for yourself like Beliefs Are For True Things, but trying to force it on others does NOT seem like a good example of “Arguing Well”.