If you get an email from aisafetyresearch@gmail.com , that is most likely me. I also read it weekly, so you can pass a message into my mind that way.
Other ~personal contacts: https://linktr.ee/uhuge
Martin Vlach
EA is neglecting industrial solutions to the industrial problem of successionism.
..because the broader mass of active actors working on such solutions renders the biz areas non-neglected?
Wow, such a badly argued( aka BS) while heavily up-voted article!
Let’s start with the Myth #1, what a straw-man! Rather than this extreme statement, most researchers likely believe that in the current environment their safety&alignment advances are likely( with high EV) helpful to humanity. The thing here is they had quite a free hand or at least varied options to pick the environment where they work and publish.
With your examples a bad actor could see a worthy EV even with a capable system that is less obedient and more false. Even if interpretabilty speeds up development, it would direct such development to more transparent models, at least there is a naive chance for that.
Myth #2: I’ve not yet met anybody in the alighnment circles who believed that. Most are pretty conscious about the double-edgedness and your sub-arguments.
https://www.lesswrong.com/posts/F2voF4pr3BfejJawL/safety-isn-t-safety-without-a-social-model-or-dispelling-the?commentId=5vB5tDpFiQDG4pqqz depicts the flaws I point to neatly/gently.
Are you referring to a Science of Technological Progress ala https://www.theatlantic.com/science/archive/2019/07/we-need-new-science-progress/594946 ?
What is your gist on the processes for humanizing technologies, what sources/researches are available on such phenomena?
some OpenAI board members who the Office of National AI Strategy was allowed to appoint, and they did in fact try to fire Sam Altman over the UAE move, but somehow a week later Sam was running the Multinational Artificial Narrow Intelligence Alignment Consortium, which sort of morphed into OpenAI’s oversight body, which sort of morphed into OpenAI’s parent company, and, well, you can guess who was running that.
pretty sassy abbreviations spiced in there.’Đ
I’ve expected the hint of
> My name is Anthony. What would you like to ask?
to show it Anthony was an LLM-based android, but who knows.?.
I mean your article, Anthropic’s work seems more like a paper. Maybe without the “: S” it would make more sense as the reference and not a title: subtitle notion.
I have not read your explainer yet, but I’ve noted the title Toy Models of Superposition: Simplified by Hand is a bit misleading in the sense to promise to talk about Toy Models which it is not at all, the article is about Superposition only, which is great but not what I’d expect looking at the title.
that that first phase of advocacy was net harm
typo
Could you please fix your Wikipedia link( currently hiding the word and from your writing) here?
only Claude 3.5 Sonnet attempting to push past GPT4 class
seems missing awareness of Gemini Pro 1.5 Experimental, latest version made available just yesterday.
The case insensitivity seems strongly connected to the fairly low interest in longevity throughout (the western/developed) society.
Thought experiment: What are you willing to pay/sacrifice in your 20s,30s to get 50 extra days of life vs. on your dead bed/day?
https://consensus.app/papers/ultraviolet-exposure-associated-mortality-analysis-data-stevenson/69a316ed72fd5296891cd416dbac0988/?utm_source=chatgpt
But largely to and fro,
*from?
Why does the form still seem open today? Couldn’t that be harmful or wasting quite a chunk of time of people?
Please go further towards maximization of clarity. Let’s start by this example:
> Epistemic status: Musings about questioning assumptions and purpose.
Are those your musings about agents questioning their assumptions and word-views?
And like, do you wish to improve your fallacies?
> ability to pursue goals that would not lead to the algorithm’s instability.
higher threshold than ability, like inherent desire/optimisation?
What kind of stability? Any from https://en.wikipedia.org/wiki/Stable_algorithm? I’d focus more on sort of non-fatal influence. Should the property be more about the alg being careful/cautious?
https://neelnanda.io/transformer-tutorial-1 link for YouTube tutorial gives 404.-(
> “What, exactly, is the difference between a cult and a religion?”—”The difference is that cults have been formed recently enough, and are small enough, that we are suspicious of them existing for the purpose of taking advantage of the special place we give religion.
now I see why my friends practicing the spiritual path of Falun Dafa have “incorporated” as a religion in my state despite the movement originally denied being classified as a religion as to demonstrate it does not require a fixed set of rituals.
Surprised to see nobody mentioned Microneedling yet. I’m not skilled in evaluating scientific evidence, but the takeaway from https://consensus.app/results/?q=Microneedling effectiveness &synthesize=on can hardly be anything else than clearly recommending microneedling.
So Alignment program is to be updated to 0 for OpenAI now that Superalignment team is no more? ( https://docs.google.com/document/d/1uPd2S00MqfgXmKHRkVELz5PdFRVzfjDujtu8XLyREgM/edit?usp=sharing )
honestly the code linked is not that complicated..: https://github.com/eggsyntax/py-user-knowledge/blob/aa6c5e57fbd24b0d453bb808b4cc780353f18951/openai_uk.py#L11
To work around the non-top-n you can supply logit_bias list to the API.
My suspicion: https://arxiv.org/html/2411.16489v1 taken and implemented on the small coding model.
Is it any mystery which of the DPO, PPO, RLHF, Fine tuning was likely the method for the advanced distillation there?